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Preface 

This  report  contains  the  papers  presented  at  the  conference  on  Small-Area  Statistics  in  Houston,  Texas, 
on  August  11,  1980,  during  two  sessions  of  the  annual  meeting  of  the  American  Statistical  Association 
(ASA),  which  was  held  jointly  with  the  Biometric  Society,  ENAR,  and  WNAR. 

The  first  session  of  the  1980  conference  concerned  Small-Area  Population  Estimates-Methods  and  Their 
Accuracy.  Jonah  Otelsberg  organized  this  session  and  Edward  J.  Spar  chaired  it.  The  speakers  were  Fred 
Cavanaugh  and  David  S.  Swanson,  and  the  discussant  for  both  papers  was  Howard  Martin. 

The  second  session  dealt  with  the  New  Metropolitan  Area  Definitions  and  Their  Impact  on  the  Private 
and  Public  Sectors.  Jonah  Otelsberg  organized  this  session  and  Joseph  Duncan  chaired  it.  Papers  were 
prepared  by  Jonah  Otelsberg,  Joseph  Duncan,  Edward  J.  Spar,  Arnold  P.  Reznek,  Randall  K.  Spoeri,  and 
John  Morawetz. 

In  addition,  this  report  includes  a  paper  prepared  by  Joseph  Waksberg,  Ralph  DiGaetano,  Richard  Yaffe, 
and  Ellen  MacKenzie  and  presented  at  a  session  held  on  August  14,  1980,  concerning  Sma// -/Area  Estimation 
organized  by  Maria  Elena  Gonzalez  and  chaired  by  Innis  G.  Sande. 

This  report  was  organized  and  prepared  by  Nancy  James  under  the  direction  of  Alice  Winterfeld, 
Chief,  Geographic  Statistical  Areas  Branch. 
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Small- 
Area  Population 
Estimates— Methods 
and 
Their  Accuracy 


Introduction 

Edward  J.  Spar 
Marketing  Statistics,  Inc. 

Prior  to  the  1970's,  a  session  such  as  this  would  probably  be  attended  only  by  statisticians  interested 
in  theoretical  problenns.  Small-area  statistics  was  not  an  area  which  would  have  drawn  large  crowds.  Once 
the  Bureau  of  the  Census  was  given  the  responsibility  for  providing  small-area  population  estimates  for 
Revenue  Sharing  purposes,  interest  in  this  area  changed  considerably. 

There  are  two  major  population  shifts  that  have  taken  place  in  the  past  decade.  First,  we  have  seen  the 
migration  of  many  Americans  to  the  south  and  west.  Second,  we  have  had  a  large  and  unknown  number  of 
Hispanic  inmigrants.  To  what  degree  do  the  estimates  from  the  Bureau  account  for  these  events?  Will  the 
1980  census  give  us  the  answers?  Probably  only  partially.  At  this  time,  however,  the  estimates  from  the 
Bureau  of  the  Censy  are  generally  accepted  as  being  reasonable. 

This  morning,  we  vv''i  hear  from  the  Bureau  as  to  how  they  plan  to  test  their  estimates  once  the  1980 
census  is  available.  We  will  also  have  a  discussion  on  allocation  accuracy  as  it  pertains  to  Federal  funding 
estimates.  Finally,  we  will  have  a  discussion  on  these  problems  from  a  user  of  the  data. 


The  Census  Bureau's  1980  Census  Test  of  Population  Estimates 

Frederick  J.  Cavanaugh 
U.S.  Bureau  of  the  Census 


INTRODUCTION 

At  no  time  in  history  have  population  estimates  played  as 
important  a  role  in  the  budgets  of  State  and  local  governmental 
units  as  they  have  during  the  1970's.  The  decade  of  the  1970's 
saw  the  proliferation  of  Federal  grant  programs  which  distribute 
billions  of  dollars  annually  to  State  and  local  governments 
using  population  updates  (or  population  estimates)  as  either 
the  sole  or  one  of  several  criteria  variables  in  disbursing  the 
funds,  in  May  1978,  the  Congressional  Research  Service  of  the 
Library  of  Congress  identified  107  different  Federal  programs 
which  use  population  in  fund  allocation— either  the  decennial 
census  results  or  more  recent  postcensal  population  estimates.' 
The  more  prominent  of  these  Federal  programs,  which  came 
into  existence  during  this  period,  include  Federal  general  revenue 
sharing  (Public  Law  92-512,  "State  and  Local  Fiscal  Assistance 
Act  of  1972"),  the  CETA  program  (Public  Law  93-203,  "Com- 
prehensive Employment  and  Training  Act  of  1972"),  and  the 
Housing  and  Urban  Development  Community  Block  Grant  pro- 
gram (Public  Law  93-383,  "Housing  and  Urban  Development 
Act  of  1972").  Because  of  the  large  amount  of  Federal  funds 
being  distributed  by  these  programs,  it  is  evident  that  the 
population  figures  used  need  to  be  as  accurate  as  possible. 
(Federal  general  revenue  sharing  alone  has  paid  out  to  State  and 
local  governments  approximately  $56  billion  in  its  short  6  years 
of  existence.) 

While  the  number  of  Federal  programs  which  use  population 
figures  as  criteria  variables  increased  dramatically  during  the 
1970's,  a  number  of  State  revenue  sharing  programs  were  also 
adopted  during  the  1970's.  However,  unlike  the  Federal 
programs  which  utilize  population  updates  for  proportional 
distribution  or  as  a  minimum  population  size  criterion,  most 
State  programs  disburse  funds  on  a  per  capita  basis,  thereby 
placing  even  more  of  an  onus  on  the  accuracy  of  population 
estimates. 

Because  of  the  tremendous  importance  already  placed  on 
population  estimates  and  the  anticipated  continuance  and 
possible    increase   in   their  use   for  fund   distribution   purposes 


during  the  1980's,  it  is  imperative  that  the  relative  accuracy  of 
population  estimates  already  in  existence  be  ascertained. 

Although  an  attempt  has  been  made  to  answer  some  of  the 
questions  and  criticisms  raised  about  the  population  estimates 
used  during  the  1970's  through  tests  of  methods  against  special 
censuses  and  comparisons  of  the  results  of  several  different 
estimating  techniques,^  these  results  are  not  nearly  conclusive 
enough  to  make  important  decisions  regarding  possible  modi- 
fication and  retooling  of  the  methods.  For  this  type  of  informa- 
tion, the  results  of  the  1980  census  are  needed.  Now  that  these 
counts  near  completion,  we  are  ready  to  embark  on  a  full- 
scale  test  of  methods  which  should  provide  the  answers  needed 
to  determine  the  course  of  action  to  be  taken  during  the  1980's. 

DEFINITIONS 

The  term  "test  of  methods"  and  its  associated  concepts 
may  mean  different  things  to  different  readers.  In  order  to 
avoid  certain  ambiguities  caused  by  different  interpretations  of 
the  terms  and  concepts  involved,  it  will  be  helpful  to  define 
the  more  commonly  used  nomenclature.  The  terms  will  be 
defined  for  the  1980  test  of  methods,  but  are  generally  appli- 
cable to  any  test  of  methods. 

A  test  of  methods,  in  the  broadest  sense,  refers  to  (1)  the 
entire  process  of  developing  population  estimates  specific  to  a 
census  date,  (2)  comparing  the  estimates  to  the  same  census 
counts,  (3)  computing  the  "errors,"  and  (4)  applying  "accuracy 
measures"  to  the  errors.  (The  term  test  of  methods  is  somewhat 
imprecise  since  the  estimates  developed  from  a  particular  model 
are  subjected  to  the  test  rather  than  the  method  itself.  Con- 
sequently, errors  in  the  test  may  reflect  errors  in  the  data  used 
in  the  model  rather  than  errors  inherent  to  the  method.) 

A  complete  test  of  methods  refers  to  a  test  of  methods  in 
which  all  of  the  tasks  set  forth  prior  to  initiation  of  the  test 
of  methods  have  been  completed. 

A  successful  test  of  methods  relates  to  a  test  of  methods  in 
which  sources  of  error  have  been  isolated  and  either  corrected 
for  or  the   method   modified.   Some  would  say  that  a  test  of 


'  U.S.  House  of  Representatives,  Subcommittee  on  Census  and  Popu- 
lation, Committee  on  Post  Office  and  Civil  Service,  95th  Congress, 
2nd  Session,  The  Use  of  Population  Data  in  Federal  Assistance  Programs, 
Committee  Print  No.  95-1 6,  U.S.  Government  Printing  Office,  Washington, 
D.CMay  31,  1978. 


^U.S.  Bureau  of  the  Census,  Current  Population  Reports,  Series 
P-25,  No.  699,  Population  and  Per  Capita  Money  Income  Estimates  for 
Local  Areas:  Detailed  Methodology  and  Evaluation,  U.S.  Government 
Printing  Office,  Washington,  D.C.,  June  1980. 


methods  is  successful  only  if  the  results  indicate  a  very  low  level 
of  error.  However,  estimation  error  is  an  acknowledged  fact  of 
population  estimates  work  and  the  results  of  a  test  of  methods 
suggesting  that  a  particular  method  or  part  of  a  method  be  dis- 
carded is  also  a  successful  test  of  methods. 

The  differences  between  the  estimates  and  census  counts  are 
labeled  as  errors  although  they  are  not  errors  as  are  usually 
defined  in  statistical  literature  where  one  deals  with  specified, 
known,  or  assumed  distributions  and  their  associated  sampling 
error,  mean  square  error,  standard  deviation,  variance,  etc. 
Rather,  they  are  interpreted  as  the  level  of  difference  from  the 
accepted  standard,  usually  a  census.  In  order  to  quantify  the 
level  of  error  for  estimates  from  a  given  universe,  several  dif- 
ferent accuracy  measures  have  gained  acceptance.  Among  these 
measures  are  the  average  absolute  percent  deviation,  root  mean 
error,  size  of  estimation  error,  directional  bias,  and  measures  of 
errors  in  estimating  distributions  (several  variations  of  the  Index 
of  Dissimilarity).  Each  taken  separately  has  some  drawbacks 
and  is  not  sufficient  to  completely  measure  accuracy.  However, 
taken  together,  the  various  measures  are  able  to  isolate  problems 
in  the  estimates. 

HISTORICAL  PATTERNS 

Any  organization  which  carries  out  an  ongoing  population 
estimates  program  should  be  interested  in  the  relative  accuracy 
of  the  population  estimates  produced,  and  a  test  of  methods 
is  the  vehicle  for  accomplishing  this  task.  The  Census  Bureau 
has  had  a  population  estimates  program  for  over  30  years  and 
has  been  involved  in  tests  of  methods  against  the  1950,  1960, 
and  1970  decennial  censuses.  Each  census  represents  an 
expansion  over  the  previous  tests,  both  with  regard  to  the  scope 
of  the  test  (items  and  methods  tested)  and  the  number  of  areas 
for  which  estimates  were  tested.  On  the  basis  of  the  results  of 
these  previous  tests,  it  has  been  noted  that  the  following 
observations  emerge  relative  to  the  accuracy  of  population 
estimates:^ 

1.  Size  of  population  is  a  major  element  in  determining  the 
expected  level  of  accuracy.  The  larger  the  area  in  terms 
of  population,  the  more  accurate  the  estimates.  Estimation 
errors  for  large  areas  (e.g..  States,  and  large  metropolitan 
areas)  are  relatively  small  (usually  less  than  2  percent),  but 
can  exceed  20  percent  for  extremely  small  areas  (areas  with 
less  than  1,000  persons).'* 

2.  Rate  of  population  change  is  also  a  major  variable.  Areas  of 
less  rapid  change  usually  incur  smaller  average  errors  than 
those  undergoing  rapid  population  growth  or  decline. 

3.  Estimates  for  States  tend  to  have  smaller  errors  than  esti- 
mates for  counties  which  are,  in  turn,  more  accurate  than 


estimates  for  cities.  This  relationship  reflects,  of  course,  the 
difference  in  sizes  of  population  involved  in  these  areas,  as 
well  as  constraints  on  the  ways  these  areas  can  grow. 

4.  Generally,  estimates  are  improved  when  results  of  two  or 
more  estimation  systems,  using  independent  data  input,  are 
averaged  together.  Averaging  also  tends  to  reduce  the  num- 
ber of  extreme  errors. 

5.  Improvements  in  the  estimation  system  occur  when  smaller 
or  lower  levels  of  geography  are  controlled  to  higher  levels 
of  geography. 

6.  There  is  some  regional  variation  in  the  accuracy  of  estimates, 
although  the  differences  may  reflect  more  population  size 
distribution  differentials,  rates  of  change,  and  other  charac- 
teristics of  subregional  geographic  units  rather  than  actual 
regional  geographic  differences  in  estimation  potentiality. 

7.  Estimates  for  all  levels  of  geography  appear  to  be  more 
accurate  in  the  most  recent  decade  as  opposed  to  earlier 
periods. 

Although  there  is  no  guarantee  that  the  results  of  the  1980 
census  test  of  methods  will  follow  these  trends,  it  would, 
indeed,  be  surprising  if  they  differed  substantially  from  the 
past. 


THE  1980  TEST  OF  METHODS 

The  conduct  of  an  efficient,  large-scale  test  of  methods,  like 
the  decennial  census  itself,  requires  a  considerable  amount  of 
planning  which  needs  to  begin  well  in  advance  of  the  census 
date.  In  the  case  of  the  test  of  methods  to  be  conducted  by  the 
Census  Bureau  against  the  results  of  the  1980  census,  the  initial 
planning  was  begun  in  mid-1978. 

Once  there  was  agreement  among  the  Bureau's  staff  assigned 
to  work  on  the  test  as  to  scope  and  direction  of  the  test,  a 
skeletal  framework  was  drawn  up  and  distributed  to  State 
participants  in  the  Federal-State  Cooperative  Program  for  Local 
Population  Estimates  (FSCP)  in  October  1978.^  The  partici- 
pating State  agencies  were  asked  to  review  the  outline  and  offer 
any  other  specific  items  they  would  like  to  see  included  in  the 


'Meyer  Zitter  and  Frederick  J.  Cavanaugh,  "Postcensal  Estimates  of 
Population,"  a  paper  precented  at  the  Annual  Meeting,  American  Asso- 
ciation for  the  Advancement  of  Science,  Session  on  the  1980  Census, 
San  Francisco,  California,  January  5,  1980.  (Copies  of  this  paper  can 
be  obtained  by  writing:  Chief,  Population  Division,  U.S.  Bureau  of  the 
Census,  Washington,  D.C.  20233.) 

*Ov';r  one-half  of  the  39,000  revenue  sharing  areas  in  the  country 
had  populations  of  less  than  1,000  in  1978. 


^The  general  term  Federal-State  Cooperative  Program  is  used  to  relate 
to  any  one  of  several  programs  entered  into  jointly  and  cooperatively 
between  the  Federal  government  and  representatives  of  individual  State 
governments.  At  the  Census  Bureau,  there  are  two  such  programs— one 
for  population  estimates  and  one  for  population  projections.  Although 
the  initials  FSCP  stand  for  either  program  at  the  Bureau,  as  used  here  it 
refers  only  to  the  population  estimates  program.  For  further  information 
on  the  Federal-State  Cooperative  Program  for  Local  Population  Estimates, 
see:  Meyer  Zitter,  "Federal-State  Cooperative  Program  for  Local  Popu- 
lation Estimates,"  The  Registrar  and  Statistican,  Vol.  33,  No.  11,  January 
1968,  pp.  4-6;  Meyer  Zitter,  "Federal-State  Cooperative  Program  for 
Local  Population  Estimates:  Status  Report,  January  1971,"  The  Registrar 
and  Statistician,  Vol.  36,  No.  4,  April  1971;  U.S.  Bureau  of  the  Census, 
Current  Population  Reports  Series  P-26,  Uo.2^ , Federal-State  Cooperative 
Program  for  Local  Population  Estimates:  Test  Results-April  1,  1970,  by 
Frederick  J.  Cavanaugh  and  Linda  C.  Braun,  U.S.  Government  Printing 
Office,  Washington,  D.C,  April  1973;  and  U.S.  Bureau  of  the  Census, 
Current  Population  Reports  Series  P-26,  No.  118,  Federal-State  Coopera- 
tive Program  lor  Local  Population  Estimates-Status  Report:  January 
1975,  by  Frederick  J.  Cavanaugh,  U.S.  Government  Printing  Office, 
Washington,  D.C,  July  1975. 


test.  Those  comments  received  from  the  participants  and 
meriting  attention  were  then  incorporated,  and  a  rather  compre- 
hensive outline  of  the  proposed  test  of  methods  emerged.  The 
final  outline  is  shown  in  the  appendix. 

This  final  outline  represents  a  rather  ambitious  undertaking. 
If  time,  staff,  and  funds  were  no  object,  this  test  could  be 
completed.  However,  to  be  realistic  and  to  insure  that  the  more 
important  aspects  of  the  test  be  thoroughly  documented,  the 
scope  must  be  narrowed  sharply.  In  order  to  make  the  best 
use  of  the  available  resources,  priorities  had  to  be  placed  on  the 
various  items  in  the  outline. 

In  order  to  obtain  a  basic  overview  of  the  test  results,  it  is 
anticipated  that  a  preliminary  phase  of  the  test  will  involve 
extrapolating  the  most  recent  estimates  (i.e.,  those  for  July  1, 
1978)  at  the  State,  county,  and  the  subcounty  levels  to  the 
April  1,  1980,  census  date  and  comparing  these  with  the  census 
preliminary  counts.  Although  a  preliminary  test  is  not  con- 
clusive, it  does  serve  a  useful  purpose  in  indicating  what  can 
be  expected  from  the  full  test  and  allows  the  Bureau  to  get  out 
some  results  in  a  relatively  short  period  of  time  (hopefully  no 
later  than  the  spring  of  1981 ).  It  is  expected  that  the  results  of 
this  preliminary  test  phase  would  be  released  as  technical  papers 
delivered  at  annual  meetings  of  professional  societies  such  as 
the  Population  Association  of  America,  the  American  Statistical 
Association,  and  the  Southern  Regional  Demographic  Group. 

As  additional  data  become  available  both  in  terms  of  the 
estimates  (i.e.,  revised  July  1,  1979,  and  provisional  April  1, 
1980,  population  estimates)  and  the  1980  census  itself,  a  more 
refined  test  at  the  State  and  county  levels  will  be  produced  and 
compared  against  the  published  census  counts.^  When  all  data 
are  available  to  prepare  the  revised  April  1,  1980,  estimates, 
additional  items  in  the  test  will  be  performed. 

Specific  Priority  Levels 

The  following  will  give  the  reader  some  idea  of  the  items 
included  in  the  various  levels  of  the  proposed  test: 

The  first  (or  basic)  level.— Simply  put,  the  first  or  basic  priority 
level  will  contain  tests  of  those  methods  as  they  were  used 
during  the  decade  of  the  1970's.  No  innovation  or  modification 
to  the  methods  employed  in  the  1970's  will  be  incorporated. 
At  this  level,  tests  of  methods  conducted  for  States  and  counties 
against  the  1970  census  results  will  be  repeated.^  While  the 
1970  tests  included  the  50  States,  the  District  of  Columbia,  and 


'Producing  the  estimates  specific  to  April  1,  1980,  is  a  departure 
from  the  procedures  used  in  the  past  when,  for  example,  July  1,  1969, 
and  July  1 ,  1970,  population  estimates  were  developed  and  then  I  inearly 
interpolated  to  the  April  1 ,  1970,  census  date. 

'U.S.  Bureau  of  the  Census,  Current  Population  Reports  Series 
P-25,  No.  520,  Estimates  of  the  Population  of  States  with  Compo- 
nents of  Change,  1970  to  1973.  by  David  L.  Word,  U.S.  Government 
Printing  Office,  Washington,  D.C.,  June  1974;  U.S.  Bureau  of  the  Census, 
Current  Population  Reports  Series  P-25,  No.  640,  Estimates  of  the 
Population  of  States  with  Components  of  Change:  1970  to  1975,  by 
David  L.  Word,  U.S.  Government  Printing  Office,  Washington,  DC, 
November  1976;  U.S.  Bureau  of  the  Census,  Current  Population  Reports 
Series  P-25,  No.  734,  Estimates  of  the  Population  of  States,  by  Age: 
July  1.  1971  to  1977,  U.S.  Government  Printing  Office,  Washington, 
D.C.,  November  1978;  and  U.S.  Bureau  of  the  Census, Current  Population 
Reports  Series  P-26,  No.  21 ,  op.  cit. 


approximately  2,600  counties  or  county  equivalents,  the  1980 
test  will  include  all  of  the  approximately  39,000  governmental 
units  which  participate  in  the  Federal  general  revenue  sharing 
program— a  fifteenfold  increase  in  the  number  of  geographic 
units  for  which  the  test  is  to  be  performed. 

At  the  State  and  county  levels,  total  population  estimates 
based  on  the  three  standard  estimating  techniques  used  during 
the  1970's  (Component  Method  II,  the  regression  (ratio- 
correlation)  method,  and  the  Administrative  Records  method), 
as  well  as  various  combinations  of  these  methods  will  be  tested 
under  the  first -priority  level.  In  addition,  at  the  State  level,  age 
and  race  estimates  will  also  be  tested. 

For  the  most  part,  the  only  method  used  to  estimate  the 
populations  of  subcounty  areas  has  been  the  Administrative 
Records  method,  and  this  method  will  receive  extensive  testing 
under  the  first-priority  level.  In  the  six  States  (California, 
Florida,  New  Jersey,  Oregon,  Washington,  and  Wisconsin) 
where  State-prepared  local  estimates  were  averaged  with  the 
Administrative  Records  method  estimates,  both  the  average  and 
the  State-prepared  estimates  will  also  be  tested.^ 

Although  estimates  of  the  racial  composition  of  metropolitan 
areas  have  not  been  a  regular  part  of  the  Bureau's  estimates 
program  the  1970's,  developmental  work  using  Federal  income 
tax  return  and  Social  Security  data  shows  promise.  Because  of 
the  potential  for  a  regular  race  estimates  program  during  the 
1980's,  it  has  been  suggested  that  the  testing  of  these  estimates 
should  also  be  given  a  high  priority. 

The  only  new  innovation  that  is  planned  for  testing  in  the 
first-priority  level  test  is  the  manner  in  which  the  estimates 
based  on  independent  methods  are  averaged  together  for  States 
and  counties.  In  the  past,  it  was  customary  to  use  a  simple  arith- 
metic average  of  the  estimates  since  it  has  been  shown  that  the 
results  of  simple  averaging  are  significantly  more  accurate  than 
estimates  based  on  any  single  estimation  technique.  However, 
it  has  been  suggested  that  there  may  be  a  more  optimum  way 
of  combining  the  estimates  based  on  several  independent 
methods.  Currently,  work  is  proceeding  on  a  modification  of 
a  procedure  proposed  by  Ericksen.^  Briefly,  this  method  would 
entail  the  use  of  multiple  regression  analysis  with  estimates 
based  on  the  individual  methods  as  the  independent  variables 
and  the  census  count  as  the  dependent  variable. 

Even  if  no  further  tests  were  made,  this,  by  far,  would  still 
constitute  the  largest  test  of  methods  ever  undertaken  anywhere. 

Second-level  priorities.— Once  the  overall  test  of  how  well  (or 
how  poorly)  the  estimates  performed  during  the  1970's  is  com- 
pleted, it  then  follows  that  isolation  of  those  components 
contributing  the  most  to  estimation  error  would  be  in  order. 
A  first  procedure  for  isolating  estimation  error  would  be 
dropping  internal  controls  (i.e.,  those  pro-rata  adjustments  made 
to  control   to  higher  levels  of  geography)  used  in  the  various 


'State-prepared  local  estimates  are  developed  by  the  FSCP  parti- 
cipating State  agency. 

'Eugene  P.  Ericksen,  "A  Method  for  Combining  Sample  Survey 
Data  and  Symptomatic  Indicators  to  Obtain  Estimates  for  Local 
Areas,"  Demography.  Vol.  10,  No,  4,  May  1973,  pp.  137-160;  and 
Eugene  P.  Ericksen,  "A  Regression  Method  for  Estimating  Population 
Changes  for  Local  Areas,"  Journal  of  the  American  Association.  Vol. 
69,  No.  348,  December  1974,  Applications  Section,  pp.  847-875. 


methods  and  selectively  repeating  some  of  the  steps  outlined 
under  the  first-priority  level.  Although  past  tests  of  methods 
strongly  indicate  that  the  use  of  such  controls  improves  the 
accuracy  of  the  estimates,  it  is  possible  that  one  or  more  of  the 
controls  currently  in  use  may  bias  the  estimates. 

Another  source  of  estimation  error  may  stem  from  errors  in 
the  administrative  data  inputs  to  the  methods.  It  is  anticipated 
that  items  in  the  1980  and  1970  censuses  comparable  to  the 
administrative  data  series  collected  routinely  (e.g.,  elementary 
school  enrollment,  institution  and  college  populations,  military 
barracks,  military  station  strength  figures)  will  be  compared. 

A  mid-decade  test  of  the  Administrative  Records  method 
estimates  against  the  results  of  approximately  2,000  special 
censuses  for  subcounty  areas  indicated  a  relatively  large  error 
rate  for  small  places  (those  under  1,000  in  population). '° 
Although  special  census  areas  are  somewhat  atypical  (in  the 
sense  that  the  communities  requesting  the  special  censuses  must 
pay  the  costs  and,  therefore,  are  likely  to  have  had  rapid  popu- 
lation gains  and  larger  error  rates),  it  is  not  anticipated  that  the 
1980  census  test  results  will  be  substantially  less  for  these  small 
areas.  Consequently,  an  alternative  method  or  modifications  to 
the  Administrative  Records  method  may  be  in  order  for  these 
areas.  The  Census  Bureau's  Construction  Statistics  Division 
collects  information  on  building  and  demolition  permits  issued 
by  over  16,000  permit  issuing  jurisdictions.  Many  of  these  juris- 
dictions fall  into  this  small  area  size  category  and  these  data, 
used  in  conjunction  with  data  from  the  1970  census,  would 
permit  the  calculation  of  Housing  Unit  method  estimates  for 
these  areas.  The  results  of  the  Housing  Unit  method  would  be 
compared  to  the  corresponding  Administrative  Records  method 
estimates  and  to  the  1980  census  results  for  small  areas.  (It 
should  be  noted  that  not  all  areas  with  populations  under  1 ,000 
are  covered  by  the  building  permit  data,  so  conclusions  drawn 
for  those  areas  with  permit  data  may  have  to  stand  for  all  areas 
in  this  size  class.) 

Another  factor  which  might  contribute  heavily  to  the  large 
average  error  in  the  Administrative  Records  method  for  small 
areas  is  the  problem  of  correctly  identifying  geographic  areas 
of  small  population  size  on  the  basis  of  individual  Federal 
income  tax  returns.  Some  possible  alternatives  to  the  place- 
specific  migration  rate  now  calculated  for  small  areas  by  this 
estimation  system  may  also  be  tested.  Modifications  and  refine- 
ments to  other  methods  for  all  geographic  areas  would  also  be 
introduced  at  this  stage,  as  well. 

Third-level  priorities.— While  the  first-level  priorities  are  certain 
to  be  completed  and  the  second-level  priorities  have  a  very 
good  chance  of  completion,  there  is  only  about  a  50-50  chance 
for  completion  of  the  third-level  priorities. 

At  the  top  of  this  level  is  a  test  designed  to  decompose  the 
error  associated  with  independent  methods— particularly  Com- 
ponent Method  1 1 -into  error  associated  with  the  model  and 
error  attributable  to  the  administrative  data  series.  This  would 
be  accomplished  by  a  multistep  procedure.  In  the  first  step,  all 
administrative  input  data  would  be  replaced  with  corresponding 


figures  from  the  1970  and  1980  censuses.  The  difference  be- 
tween these  estimates  and  the  corresponding  1980  census 
counts  would  be  the  error  due  to  the  model.  Then,  by  selectively 
replacing  the  census  comparable  data  (one  item  at  a  time)  by 
administrative  data,  the  error  for  each  data  series  could  be 
determined.  Of  course,  if  the  1980  census  undercount  de- 
viates significantly,  either  in  magnitude  or  differentially  by 
characteristic  from  1970,  this  test  would  be  meaningless  since 
the  differences  in  undercount  would  impact  the  model  error. 

For  approximately  130  counties  and  over  2,000  subcounty 
areas,  errors  of  estimates  in  1980  can  be  compared  to  errors 
developed  from  special  censuses  conducted  around  mid-decade. 
Such  error  trend  information  may  open  the  door  to  a  more 
thorough  understanding  of  how  errors  are  distributed  within  a 
decade. 

Another  possible  test  for  areas  below  the  county  level  would 
indicate  how  annexations  and  boundary  changes  impact  the 
accuracy  of  the  estimates  for  these  areas.  It  may  well  be  that 
those  portions  of  cities  and  towns  annexed  after  the  last  decen- 
nial census  may  have  population  growth  patterns  which  are 
quite  different  from  those  which  were  in  existence  at  the  time 
of  the  census. 

Lower  priority  levels.— Most  of  the  items  assigned  to  these  lower 
levels  could  be  categorized  as  experimental  in  nature.  Many 
deal  with  fine  tuning  of  methods  which  are  already  quite 
precise,  and  if  they  have  any  significant  potential  for  additional 
accuracy,  this  potential  will  dictate  their  testing  rather  than  to 
be  confined  to  the  time  limits  imposed  by  this  particular  test. 


Publication  of  the  Results 

Prior  to  the  1970  census,  results  of  tests  of  methods  con- 
ducted by  the  Census  Bureau  were  only  released  through  papers 
prepared  by  Bureau  staff  members  and  presented  at  annual 
meetings  of  professional  societies.  While  this  practice  was 
carried  out  to  some  degree  with  the  results  of  tests  of  methods 
against  the  1970  census,  the  bulk  of  the  test  results  appeared  in 
four  separate  official  Census  Bureau  reports.* '  The  wider  publi- 
cation area  reflects  the  increased  interest  in  the  Bureau's  popula- 
tion estimates  from  a  clientele  whose  needs  differ  from  demo- 
graphic and  statistical  professionals  and  academicians.  With 
even  more  interest  and  emphasis  placed  on  the  Bureau's  esti- 
mates in  the  1970's  and  because  of  the  use  of  the  estimates  in 
Federal  grant  programs,  it  will  be  even  more  important  to  have 
a  wide  audience  to  address  the  test  results.  Preliminary  test 
results  will,  in  all  probability,  go  the  route  of  pre-1970  tests 
with  papers  and  articles  in  the  professional  media,  since  these 
results  are  not  definitive  but  only  give  indication  of  the  final 
tests.  Since  some  of  the  preliminary  test  results  may  be  specu- 
lative, the  proper  exposure  would  be  to  those  groups  which 
are  familiar  with  the  problems  inherent  with  provisional  results. 


'"U.S     Bureau    of    the   Census,  Currnnt   Population   Reports  Series 
P  25.  No  699,  op.  cit. 


''U.S.  Bureau  of  the  Census,  Current  Population  Reports  Series 
P-25,  No.  520,  op.  cit.;  U.S.  Bureau  of  the  Census,  Current  Population 
Reports  Series  P-25,  No.  640,  op.  cit.;  U.S.  Bureau  of  the  Census,  Current 
Population  Reports  Series  P-25,  No.  734,  op.  cit.;  and  U.S.  Bureau  of 
the  Census,   Current   Population    Reports  Series   P-26,   No.  21,  op.  cit. 


For  the  findings  of  the  regular  test  results,  we  are  currently 
considering  several  individual  reports  in  an  appropriate  Bureau 
publication  vehicle,  although  neither  has  the  exact  number  of 
reports  been  decided  upon  nor  has  the  exact  publication  series 
been  selected  at  this  time.  The  current  plans  call  for  up  to  two 
publications  for  the  tests  of  the  provisional  estimates— maybe 
one  for  States  and  one  for  counties.  It  is  then  planned  that  the 
next  three  publications  would  contain  the  results  of  the  revised 
estimates  (one  report  each  for  States,  counties,  and  subcounty 
areas).  A  possible  sixth  report  would  reproduce  the  first  few 
reports  in  a  single  volume,  make  comparisons  of  the  provisional 
and  revised  estimates  tests,  and  tie  together  the  entire  test. 
Reports  would  be  published  on  a  flow  basis  as  the  test  results 
for  each  segment  are  completed,  but  their  exact  order  has  not 
been  decided  upon  at  this  time.  Of  course,  as  we  get  into  the 
actual  test,  this  tentative  publication  schedule  may  change 
considerably. 
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Appendix 


1980  TESTS-OUTLINE 


Part  I— The  Test  Structure 

A.  State  Estimates 

1.  Test  standard  techniques  as  they  have  been  used  in  the  1970's  (i.e..  Component  Method  II,  the 
Regression  method,  and  the  Administrative  Records  method). 

a.  Singularly. 

b.  All  possible  combinations. 

c.  Special  cases— Alaska,  Connecticut,  the  District  of  Columbia,  etc. 

2.  Test  standard  techniques  incorporating  the  "rake/float"  adjustments  from  the  FSCP  county  esti- 
mates and  the  "revenue  sharing"  estimates.  Test  each  method  singularly  with  comparisons  to 
A.I. a.  above. 

3.  Possible  sources  of  error. 

a.  Possible  modeling  errors. 

b.  Possible  input  data  errors— Comparison  of  change  in  synthetic  variables  with  change  in  census 
variables. 

c.  Impact  of  data  errors  on  the  overall  error  of  the  estimates. 

4.  Test  modifications  to  standard  techniques  —  Potential  modifications. 

a.  Component  Method  II— Total  population. 

b.  Component  Method  II— Specific  age  groups. 

c.  The  Regression  (ratio-correlation)  method. 

d.  The  Administrative  Records  method. 

5.  "Best  estimates"— Estimates  developed  using  intuitive  modifications  based  on  knowledge  and 
observation  of  an  area. 

6.  Casting  out  of  outliers— Remove  from  consideration  of  any  estimate  which  either: 

a.  Exceeds  the  average  of  estimates  (in  absolute  terms)  by  5  percent  or  more. 

b.  Runs  counter  to  the  trend  established  by  the  other  two  methods. 

7.  Optimum  weighting— Produce  regressions  for  "optimum  weighting"  of  methods. 

a.  Comparison  of  estimates  against  census. 

b.  Statistical  comparison  of  beta  coefficients  to  Bi  =6^  =0.50. 

8.  Trend  in  regression  beta  coefficients  over  time. 

a.  One-to-one  comparisons. 

b.  Statistical  significance  of  differences. 

9.  Test  State  age  estimates. 

a.  As  they  were  computed  in  the  1970's. 

b.  Using  Regression  analysis. 

10.   Test  State  race  estimates. 


B.  County  Estimates 

1.  Test  of  standard  estimating  techniques  used  in  the  1970's  (excluding  adjustments  for  special 
censuses). 

a.  Singularly. 

b.  All  possible  combinations. 

2.  Adjustments  for  place  and  county  special  censuses. 

a.  Repeat   B.I. a.  and   B.l.b.  using  the  "rake/float"  adjustments  for  place  special  censuses  only. 

b.  Repeat  B.I. a.  and  B.l.b.  using  the  "rake/float"  adjustments  for  counties  with  special  censuses 
only. 

c.  Repeat  all  of  the  above  using  the  "rake/float"  adjustments  for  both  place  and  county  special 
censuses. 

3.  Possible  sources  of  error. 

a.  Possible  modeling  errors— Repeat  same  as  A. 3. a.  at  the  county  level. 

b.  Possible  input  data  errors— Repeat  same  as  A. 3.0.  at  the  county  level. 

c.  Impact  of  data  errors  on  the  overall  error  of  the  estimates— Repeat  same  as  A.3.C.  at  the  county 
level. 

4.  Error  rate  trends-comparison  of  deviations  from  mid-1970  special  censuses  with  deviations  from 
1980  census.  (Restricted  to  only  those  counties  with  special  censuses  conducted  some  time  during 
the  1970's.) 

5.  Test  of  modifications  to  standard  methods  —  Modifications  to  individual  standard  methods. 

a.  Component  Method  II. 

b.  The  Regression  (ratio-correlation)  method. 

c.  The  Administrative  Records  method. 

d.  Housing  Unit  method  (large  metropolitan  counties  only). 

6.  Test  special  populations. 

a.  Institutions  and  colleges. 

b.  Military  barracks. 

c.  Military  station  strength. 

7.  Test  new  "experimental"  methods. 

a.  The  Difference  correlation  (O'Hare  procedure). 

b.  Percent  change  (modified  Ericksen  approach). 

c.  Straight  regression  (standard  textbook  approach). 

d.  Ridge  regression— Repeat  all   tests  of  ratio-correlation,  ratio  difference,  percent  increase,    and 
straight  regression  using  the  ridge  regression  approach. 

e.  Dummy  variables— Repeat  all  of  items  B.7.a.  through  B.7.d.  using  dummy  variables  for  size  of 
county,  growth  rates,  and  metropolitan-nonmetropolitan  residence. 

f.  Nambooderi  Simple  Regression. 

g.  New  "Composite"  methods. 

8.  Test  NCI  age-sex-race  estimates. 
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C.  Subcounty  estimates 

1.  Test  standard  procedures. 

a.  Singularly. 

b.  Combinations— where  estimates  based  on  other  estimating  techniques  are  available  or  can  be 
developed. 

2.  Possible  sources  of  error. 

a.  Possible  modeling  errors— Repeat  same  as  A. 3. a. 

b.  Possible  input  data  errors— Repeat  same  as  A.3.b.  Compare  3/4  of  estimated  1979  births  plus 
1/4  of  estimated  1980  births  reduced  by  the  appropriate  survival  factors  to  the  population  under 
1  year  old  recorded  in  the  1980  census. 

c.  Impact  of  data  errors  on  the  overall  error  of  the  estimates— Repeat  same  as  A.3.C. 

3.  Impact  of  adjustments  for  boundaries  and  annexations. 

a.  Compute  Administrative  Records  method  estimates  excluding  adjustments  for  boundary  changes 
and  annexations. 

b.  Compare  deviations  for  areas  with  annexations  and  boundary  adjustments,  both  with  and  without 
adjustments,  against  deviations  for  areas  without  boundary  changes  and  annexations. 

4.  Modifications  to  Administrative  Records  method. 

a.  Compute  Administrative  Records  method  estimates  taking  into  account  adjustment  factors  for 
differential  filing  patterns  by  race.  Repeat  item  C.I.  above  using  these  estimates. 

b.  For  places  under  1,000  population  in  1970,  use  total  exemptions  on  Federal  individual  income 
tax  returns  as  estimates  and  compare  deviations  with  deviations  of  standard  and  modified 
Administrative  Records  method  estimates. 

c.  Expand  time  intervals  between  matched  years.  Recompute  Administrative  Records  method 
estimates  and  repeat  items  C.I.  and  C.2.  above.  (Restricted  to  a  1 -percent  sample  of  individual 
Federal  income  tax  returns.) 

d.  Compute  Administrative  Records  estimates  separately  for  the  under  65  year  old  population  and 
the  population  65  years  old  and  over.  Test  total  population  and  age  groups  separately. 

e.  Drop  all  controls  to  the  county  level  and  repeat  item  C.I.  above. 

5.  Housing  Unit  method  (16,000  areas  only). 

a.  Compare  estimates  of  total  housing  inventory  and  occupied  housing  units  with  comparable 
measures  from  the  1980  census. 

b.  Recompute  Housing  Unit  method  persons  per  household  using  adjustments  (same  adjustments 
as  shown  under  Housing  Unit  method  for  counties). 

6.  Error  trends— Compare  1980  deviations  with  deviations  from  special  censuses  conducted  in  the  mid- 
1970's. 

7.  Best  test  for  the  Administrative  Records  method. 

a.  Make  single  passes  and  compare  with  the  1980  census. 

b.  Take  intersection  of  all  of  items  under  C.7.a.  and  compare. 

8.  Alternative  estimates  for  places  with  populations  under  1,000  population. 

a.  Replace  the  net  migration  rate  for  places  under  1,000  population  derived  using  the  procedures 
of  the  1970's  with  other  measures  of  net  migration  including  the  pooled  rates  for  all  small  areas 
and  for  some  larger  encompassing  geographic  area. 

b.  Apply  ratio  of  population  in  the  1970  census  to  the  number  of  total  exemptions  on  individual 
Federal  income  tax  returns  for  tax  year  1969  to  the  number  of  total  exemptions  on  individual 
Federal  income  tax  returns  for  tax  year  1979. 

c.  Assume  no  migration  (population  only  affected  by  natural  change). 
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Part  II— Tabulation  Categories 

A.  All  tabulations  and  computations  of  estimates  will  be  specific  to  April  1,  1980. 

B.  Controls. 

1 .  "Best  estimate." 

a.  States  to  national  control. 

b.  Counties  to  State  control. 

c.  Subcounty  areas  to  county  controls. 

2.  1980  census— same  categories  as  B.I. 

3.  Uncontrolled. 

4.  Control  counties  to  national  control— get  State  estimates  by  summation. 

5.  Control  subcounty  estimates  to  State  controls— get  county  estimates  by  summation. 

6.  Control  subcounty  estimates  to  national  controls— get  State  and  county  estimates  by  summation. 

C.  Two  tests. 

1 .  Provisional  estimates. 

2.  Revised  estimates. 

D.  Summary  tabulation  categories. 

1.  Regions. 

2.  Divisions. 

3.  States  by  size  of  State  (1970). 

4.  Counties. 

a.  By  size  of  county  (1970). 

b.  By  growth  rate  (1970  to  1980). 

c.  By  metropolitan  status  (1980). 

5.  Number  of  positive  deviations. 

6.  Areas  by  size  of  deviation. 

7.  Signed  deviation  by  signed  growth  rate. 

8.  Subcounty  areas— same  as  items  D.I.  through  D.7.  above. 

Part  III— Report  Presentation 

A.  Reports  on  provisional  estimates. 

1 .  States. 

2.  Counties. 

3.  Subcounty  areas. 

B.  Reports  on  revised  estimates. 

1 .  States. 

2.  Counties. 

3.  Subcounty  areas. 

C.  Summary  and  comparison  of  provisional  and  revised. 

D.  Overall  report  of  all  the  above  reports. 
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ABSTRACT 

An  annual  set  of  county  population  estinriates  is  an  important 
service  provided  by  State  Dennographic  Centers  and  the  U.S. 
Bureau  of  the  Census.  The  standard  techniques  used  for  these 
estimates,  although  not  usually  acknowledged  as  such,  can  be 
viewed  as  a  procedure  used  to  allocate  a  State's  population 
among  its  counties.  While  the  accurate  allocation  of  a  State's 
population  is  important  in  its  own  right,  it  has  implications  for 
the  revenue  sharing  programs  of  the  Federal  government  and 
many  States  because  an  annual  population  estimate  is  an 
important— and  for  some  States  the  sole— component  in  for- 
mulas used  for  allocating  funds  to  counties  and  their  recognized 
governmental  subunits.  This  report  argues  that  the  concept  of 
allocation  accuracy  should  be  routinely  applied  to  population 
estimates  used  for  fund  allocations  and,  further,  describes  a 
summary  measure  of  allocation  accuracy  termed  the  Index  of 
Misallocation.  It  is  also  argued  that  both  the  ratio-correlation 
form  of  multiple  regression,  a  standard  technique  used  for 
population  estimates,  and  the  difference-correlation  form,  a 
variant  recently  receiving  attention,  have  in  common  two 
distortions  that  render  incomplete  the  traditional  regression 
model  evaluation  criteria  such  as  R^  and  S.E.E.  Using  these  two 
regression  methods  in  conjunction  with  county  data  for 
Washington  State,  the  report  demonstrates  the  utility  of  using 
the  Index  of  Misallocation  as  a  criterion  for  model  evaluation. 
The  report  concludes  that  the  Index  should  be  included  as  one 
of  the  standard  criteria  for  evaluating  both  these  two  regression 
forms  and  the  actual  allocation  accuracy  of  estimates  produced 
by  these  and  other  methods. 

INTRODUCTION 

While  accurate  county  population  estimates  are  important  in 
their  own  right,  they  provide  an  important  basis  for  allocating 
State  and   Federal  general  revenue  sharing  and  other  funds  to 


counties  and  are  usually  made  using  the  ratio-correlation  form 
of  multiple  regression  (Namboodiri,  1972;  O'Hare,  1976; 
Pursell,  1970;  Schmitt  and  Crosetti,  1954;  Serow  and  Martin, 
1977;  and  U.S.  Bureau  of  the  Census,  1973a);  although  the 
difference-correlation  from  of  multiple  regression  (O'Hare, 
1976;  Schmitt  and  Grier,  1966;  Spar  and  Martin,  1979;  and 
Swanson,  1978)  and  other  procedures  and  combinations  of 
procedures  may  be  used  (Bogue,  1950;  Namboodiri  and  Lalu, 
1971;  U.S.  Bureau  of  the  Census,  1973a;  and  Zitter  and 
Shryock,  1964).  These  estimates  of  population  for  revenue 
sharing  and  other  purposes  are  made  by  State  Demographic 
Centers  and  the  U.S.  Bureau  of  the  Census  (Engels,  1978; 
Rosenberg  and  Myers,  1977;  U.S.  Bureau  of  the  Census,  1973a; 
and  Walker,  1976). 

In  terms  of  State  revenue  sharing,  the  total  funds  available 
for  allocation  within  a  given  State  are  usually  fixed  by  legislative 
action  independently  of  population  estimates  and  are  based  on 
revenue  collections.  While  actual  funding  allocation  formulas 
may  utilize  additional  variables,  "There  are  many  states  that 
now  allocate  state-generated  revenues  as  well  as  selected  federal 
grants  wholly  or  partially  on  the  basis  of  demographic  informa- 
tion. .  ."  (Rosenberg  and  Myers,  1977).  For  example  in  the 
States  of  Florida  (Doolittle  and  Jones,  1974),  Washington 
(Washington,  1978),  and  Wisconsin  (Wisconsin,  1973)  current 
county  population  estimates  are  the  sole  component  for 
allocating  State  revenue  sharing  funds  directly  to  county 
governments,  while  in  California  (Hollman,  1979),  a  county 
population  estimate  is  a  component  of  its  revenue  sharing 
formula.  In  the  Federal  revenue  sharing  program,  county 
population  estimates  are  a  component  in  determining  the 
boundaries  for  revenue  sharing  funds  allocated  directly  to 
subcounty  units.  Funds  in  this  program  are  distributed  first 
among  States,  then  among  the  counties  in  each  State  and  finally 
actually  allocated  directly  to  recognized  subcounty  units  such 
as  municipalities  and  townships   (Heintz  and  Hart,  1973:20). 
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(See  also  Auten,  1974;  Barth  et  al.,  1976;  Tomer,  1977;  U.S. 
Bureau  of  the  Census,  1973b;  U.S.  O.F.S.P.S.,  1978;  and  U.S. 
Subcommittee  on  Census  and  Population,  1978.) 

Although  not  usually  acknowledged  as  such  (for  a  notable 
exception  see  Patil  et  al.,  1974),  these  county  population 
estimates  can  be  viewed  as  an  allocation  procedure,  which,  in 
general  terms  can  be  described  as  the  problem  of  distributing  N 
items  among  K  mutually  exclusive  and  exhaustive  categories, 
where  N  is  determined  independently  of  the  distribution 
procedure.  County  population  estimates  fall  within  this  defini- 
tion because  the  State  total  is  always  determined  independently 
of  the  county  population  estimates,  which,  as  has  been  implied, 
are  always  "controlled"  to  sum  to  the  State  total.  In  effect,  the 
actual  county  estimates  are  never  made  in  isolation  from  the 
current  estimated  State  total  and  the  historical  pattern  of 
population  distribution  by  county.  As  is  discussed  later  in  this 
report,  these  two  factors  have,  especially,  important  conse- 
quences for  using  the  ratio-correlation  and  difference- 
correlation  forms  of  multiple  regression;  consequences  that  have 
been  too  long  overlooked.  Where  funds  are  allocated  directly  in 
accordance  with  county  population  allocations,  the  population 
allocations  can  be  interpreted  directly  as  funding  allocations. 
Where  funds  are  allocated  using  variables  in  addition  to 
population  estimates  (whether  these  are  distributed  directly  to  a 
county  government  or,  as  in  the  case  of  Federal  revenue  sharing 
the  municipalities  and  other  recognized  subcounty  units  with- 
in a  given  county),  an  evaluation  of  accuracy  in  funding 
allocations  must  be  accomplished  using  funds  allocated  through 
estimates  in  comparison  with  funds  allocated  using  actual 
population  counts.' 

This  report  argues  that  the  concept  of  population  allocation 
accuracy  should  be  routinely  included  in  evaluations  of  models 
and  estimates  resulting  from  them  when  population  estimates 
are  used  for  allocating  funds.  Because  of  the  importance  of 
accurately  allocating  funds,  the  millions  of  State  and  Federal 
dollars  that  are  annually  allocated,  and  the  integral  role  that 
county  population  estimates  play  in  determining  these  alloca- 
tions, the  report  addresses  the  concept  of  allocation  accuracy  by 
focusing  on  county  population  estimates. 

In  order  to  provide  a  comparative  basis  to  illustrate  both  the 
shortcomings  of  traditional  model  evaluation  criteria  and  the 
utility  of  the  proposed  index  for  population  allocation  accu- 
racy, the  ratio-correlation  and  difference-correlation  forms  are 
evaluated  using  data  for  Washington  State.  Keep  in  mind  that 
the  purpose  of  this  comparison  is  not  to  determine  if  one  of  the 
two  regression  forms  is  universally  superior  for  allocating 
population.  This  issue  should  be  reviewed  on  a  State-by-State 
basis  since  it  is  an  empirical  rather  than  a  theoretical  question 
(Namboodiri,  1972:452).  Neither  is  it  implied  by  focusing 
exclusively  on  these  two  regression  forms  that  other  methods 
are  inferior.  It  is,  rather,  the  purpose  of  the  report  to  illustrate 
the  role  that  an  index  of  allocation  accuracy  can  play  in  the 
evaluation    of    competing    regression    models    and    the    actual 


allocation  accuracy  of  estimates  produced  by  regression  and 
other  methods.  The  index,  termed  the  Index  of  Misallocation,  is 
described  later  in  the  report.  This  index,  it  is  argued,  is 
especially  useful  where  the  intent  of  an  allocation  procedure  is 
to  minimize  either  directly  or  indirectly  the  misallocation  of 
finite  funds  or  other  resources. 

THE  REGRESSION  FORMS 

Both  the  ratio-correlation  and  the  difference-correlation 
methods  use  proportional  numbers,  which  means  that  the 
county  populations  must  sum  to  an  independently  estimated 
State  total.  Both  forms  provide  a  model  that  estimates  the 
temporal  change  in  the  county  population  proportions.  The 
independent  variables  are  the  temporal  changes  in  the  county 
proportions  of  symptomatic  indicators  such  as  auto  registra- 
tions, employment,  registered  voters,  and  the  like.  The  only 
dissimilarity  between  the  two  forms  is  in  the  measurement  of 
temporal  change.  In  the  ratio-correlation  form,  temporal  change 
is  measured  by  taking  a  ratio  of  proportions  at  two  points  in 
time,  while  in  the  difference-correlation  form  the  temporal 
change  is  measured  by  subtraction.  The  models  are  described 
more  formally  in  the  appendix. 

Once  a  model  is  constructed,  the  actual  estimation  is 
accomplished  by  algebraically  manipulating  the  estimation  of 
change  in  proportions  into  actual  population  numbers  after 
estimating  the  changes  in  proportions  using  the  current  sympto- 
matic indicators  that  have  been  substituted  into  the  model.  The 
estimated  county  population  numbers  are  then  adjusted  to  the 
independently  derived  total  State  population  by  multiplying  the 
ratio  of  the  independently  derived  State  total  to  the  sum  of 
the  estimated  county  populations  by  each  estimated  county 
population. 

Although  Ericksen  (1973,  1974)  argues  that  regression 
models  incorporating  sample  data  are  useful  in  estimating 
county  populations,  this  approach  is  not  often  found  in  actual 
practice.  Consequently,  the  discussion  of  the  regression  models 
and  procedures  in  this  report  is  directed  toward  complete  rather 
than  sample  data. 

TEST  DATA  AND  MODELS 

Using  Washington  State  as  an  example,  enrollment  in  grades 
1  to  8  is  the  county  "population"  selected  for  estimation 
because  a  series  of  annual  estimates  can  be  compared  to  a  series 
of  annual  actual  reported  enrollment.^  Three  symptomatic 
indicators  are  used:  employment  (covered  by  unemployment 
insurance);  registered  voters;  and  registered  private 
automobiles.^  The  two  observation  points  over  which  the 
models  are  constructed  are  the  census  years,  1  970  and  1 960;  the 
39  counties  of  Washington  are  the  cases.  The  years  for  which 
estimates  derived  from  the  1970-1960  based  models  are 
compared  with  actual  reported  enrollments  are  1971  through 
1977.  The  three  symptomatic  variables  used  in  the  regression 


'  It  is  important  to  realize  that  the  actual  population  counts  against 
which  estimates  are  jurjged  for  accuracy  are  themselves  not  "true" 
population  totals.  They  merely  represent  a  convention  used  for  alloca- 
tions. This  point  IS  pursued  by  Keyfitz  (1979). 


'The  data  used  in  the  report  are  In  the  official  Washington  State 
Population  Data  Base  maintained  by  the  Forecast  and  Support  Division, 
Office  of  Financial  Management,  Olympia,  Washington. 
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forms  were  selected  because  they  represent  a  subset  of  the 
four  variables  usually  used  to  estimate  county  populations  in 
Washington,  the  fourth  variable  being  the  grades  1  to  8 
enrollment.  Remember,  the  intent  of  the  example  data  set  is  to 
illustrate  the  concept  of  misallocation  and  its  measurement.  The 
example  data  set  is  not  meant  to  imply  that  the  models  that 
could  be  constructed  using  these  three  symptomatic  indicators 
are  those  that  would  be  actually  used  for  county  population 
estimates;  neither  it  is  meant  to  imply  that  an  estimate  of 
enrollment  is  a  necessary  step  in  producing  a  county  population 
estimate.  The  example  data  set  was  selected  because  (1)  it 
utilized  a  set  of  symptomatic  indicators  that  were  known  to  be 
"clean"  and  were  readily  available,  without  any  prior  knowledge 
of  the  behavior  of  the  models  (or  their  alternatives)  that  could 
be  constructed  from  these  data;  and  (2)  because  the  estimated 
dependent  variable  could  be  compared  with  actual  reported  data 
for  a  current  8-year  period— an  evaluation  not  possible  with 
actual  county  population  figures,  which  would  only  be  available 
for  testing  at  a  single  point  in  time  (1970)  using  models 
constructed  over  the  1960-1950  period. 

Typically,  the  evaluation  of  candidate  regression  models 
involves  a  two-step  procedure.  The  first  step,  termed  here  as  the 
model  construction  step,  is  evaluated  using  traditional  regression 
"goodness  of  fit"  indicators  such  as  R^  and  the  Standard  Error 
of  Estimate  (S.E.E.).  The  second  step,  termed  here  as  the  model 
estimation  accuracy  step,  is  conducted  by  comparing  estimates 
produced  by  candidate  models  with  actual,  enumerated  popula- 
tions. Typically,  the  summary  indices  used  in  this  second  step 
are  the  mean  of  the  absolute  percentage  differences,  mean  square 
error,  or  the  number  of  times  a  given  level  of  error  is  exceeded. 
The  advantage  of  using  both  steps  to  evaluate  candidate  models 
is  obvious:  indices  of  model  "fit"  used  in  the  first  step  can  be 
checked  against  the  indices  of  estimation  "fit"  used  in  the 
second  step.  However,  a  major  disadvantage  of  this  procedure  is 
that  it  evaluates  models  that  are  two  decades  old.  For  example. 


the  most  recent  models  that  have  undergone  the  full  evaluation 
are  those  that  were  constructed  using  1960-1950  data  and 
evaluated  with  1970  census  data.  However,  by  using  enrollment 
data  as  the  dependent  estimation  variable,  in  this  report  models 
constructed  using  1970-1960  data  can  be  evaluated  since 
enrollment  data  are  available  subsequent  to  1970  for  use  in  the 
second  step  of  evaluation. 

The  two  models  constructed  using  the  1970-1960  data  are 
described  in  table  1.  Generally,  both  models  exhibit  favorable 
characteristics  but  using  the  traditional  criteria,  the  Coefficient 
of  Determination  (R^)  and  Standard  Error  of  Estimate  (S.E.E.), 
the  difference-correlation  model  would  be  regarded  as  superior 
to  the  ratio-correlation  model  since  the  difference-correlation 
model  has  a  higher  R^  and  lower  S.E.E.;  although,  especially  for 
the  S.E.E.,  this  is  because  "differences"  produce  smaller  values 
than  do  "ratios"  over  variables  constructed  using  Washington 
State's  symptomatic  and  enrollment  proportions  by  county. 

Although  additional  model  evaluation  criteria  are  available, 
such  as  residual  examination  (see,  e.g..  Draper  and  Smith,  1966; 
Goldberger,  1964),  they  still  do  not  address  the  concept  of 
allocation  "goodness  of  fit";  Neither  do  the  traditional  measures 
of  estimation  "fit"  for  the  second  step  of  model  evaluation  (see, 
e.g.,  Goldberg  et  al.,  1964;  O'Hare,  1976;  Pursell,  1970; 
Rosenberg,  1968;  Serow  and  Martin,  1977;  Martin  and  Serow, 
1978;  Spar  and  Martin,  1979;  Swanson,  1978;  Zitter  and 
Shryock,  1964;  or  Zitter  and  Word,  1971).  Before  proceeding  to 
an  evaluation  of  the  two  models  just  described,  it  is  appropriate 
to  address  the  issue  of  allocation  accuracy  and  describe  a 
summary  index  for  it. 

MISALLOCATION  AND  ITS  MEASUREMENT 

Perhaps  the  most  important  impetus  for  annually  estimating 
county  and  other  sub-State  area  populations  accurately  is  to 
provide  a  basis  for  the  allocation  of  funds.  To  the  extent  that  a 


Table  1.  Multiple  Regression  Models 


Ratio-correlation  form 

Difference-correlation  form 

Symptomatic  indicator 

Unstandardized 

regression 

coefficient 

Standardized 

regression 

coefficient 

Unstandardized 

regression 

coefficient 

Standardized 

regression 

coefficient 

Intercept  

Employment 

-.01469 
.07633 
.22181 
.66428 

0.0000 
.13889 
.17617 
.62861 

0.0000 
-.42146 
1.49536 
.64890 

0.0000 
-.44870 

Voters 

Registered  Autos 

.79796 
.41239 

R      = 

=    .8679 

R2    = 

=    .7532 

Adj.R^    = 

=    .7321 

S.E.E.    = 

=    .0744 

R      = 

=    .90619 

R^    = 

=    .82117 

Adj.R^    = 

=    .80584 

S.E.E.    = 

=    .00259 

Index  of  Misallocation  =  2.73 


Index  of  Misallocation  =  2.90 
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given  State  total  population  is  misallocated  annong  the  State's 
counties,  the  funds  from  a  given  revenue  sharing  program 
depending  on  population  estimates  for  allocation  will  also  be 
misallocated.  Where  misallocation  occurs,  one  or  more  counties 
obtain  more  than  intended  entitlements  at  the  expense  of  one 
or  more  other  counties,  which,  in  turn,  receive  less  than 
intended  entitlements.  As  an  example,  suppose  that  funds  are 
distributed  directly  in  relation  to  population  and  that  a  given 
county  is  estimated  to  have  35  percent  of  the  State's  total 
population  while  in  fact  it  has  actually  only  25  percent.  This 
county,  regardless  of  whether  the  State  total  population  is 
incorrectly  or  accurately  estimated,  will  receive  35  percent  of 
the  revenue  sharing  funds  and  the  remaining  counties  will 
receive  65  percent.  In  this  case,  the  county  will  unfairly  receive 
40  percent  more  than  its  entitlement  while  the  remaining 
counties  face  a  10  percent  loss  in  terms  of  state  funds. 
Obviously,  in  order  to  conform  to  legislative  intent,  it  is  of 
paramount  importance  that  county  population  estimates  be 
accurate  with  respect  to  their  shares  of  the  State  total  when 
these  estimates  are  used  as  a  basis  for  fund  distribution.  It  is 
equally  obvious  that  misallocation  is  measured  neither  by  the 
set  of  traditional  measures  used  directly  in  evaluating  model 
accuracy  (such  as  goodness  of  fit  measures  like  the  coefficient  of 
determination  in  the  case  of  a  regression  model  or  mean  square 
error,  say,  in  the  case  of  other  procedures)  nor  by  the  set  used 
in  evaluating  subsequent  accuracy  (means  of  squared  errors  or 
absolute  percentage  differences).  A  summary  index  that  does 
directly  measure  misallocation  is  described  in  the  following 
section. 

A  SUMMARY  INDEX  OF  MISALLOCATION 

Although  it  has  been  commonly  found  in  the  demographic 
and  sociological  literature  for  many  years  under  different 
names,  the  Index  of  Dissimilarity  (Duncan  and  Duncan,  1955; 
or  Shryock  and  Siegel,  1973;  232-233)  is,  with  a  slight 
adaptation,  immediately  suitable  as  a  summary  index  of 
misallocation.  Defined  in  the  following  formula,  this  adaptation 
of  the  Index  of  Dissimilarity  will  be  refered  to  as  the  Index  of 
Misallocation  (IM). 


IM   =   (72   2     I    P  .   -   P  .      1)    X   100 
'      ei  ai      I 


Where   P  .   =  estimated  population  in  county  i, 
ei 


P  .   =  actual  population  in  county  i, 
ai 


The  index  simply  sums  the  absolute  values  of  the  differences 
between  the  adjusted  estimates  and  the  actuals,  divides  this  sum 
by  2,  divides  the  entire  result  by  the  total  State  population  and, 
finally,  multiplies  the  preceding  steps  by  100  so  that  misalloca- 
tion is  defined  as  a  percentage.  The  index  gives  the  percent  of 
the  total  State  population  that  must  be  reallocated  in  order  to 
have  the  estimates  achieve  zero  misallocation. 

An  example  of  the  interpretation  of  the  index  is  as  follows. 
Suppose  that  there  are  four  counties  in  a  hypothetical  State 
with  the  following  estimated  and  actual  populations.  (Note  that 
the  estimated  county  populations  have  already  been  adjusted  to 
this  hypothetical  State's  total  population). 


County 

A 
B 
C 
D 


Estimated 

15,000 

10,000 

30,000 

5,000 


Actual 

12,000 

11,000 

32,000 

5,000 


P      =  total   state  population,  to  which  the  values  of  P_ 
are  adjusted  so  that 


ei 


In  this  example,  IM  is  found  to  be  5.00,  which  is  interpreted 
to  mean  that  5  percent  of  the  estimated  county  populations 
needs  to  be  redistributed  in  order  to  achieve  zero  misallocation. 
The  straightforward  interpretation  of  misallocation  available  by 
using  IM  is  not  available  with  other,  more  traditional  summary 
measures  of  error.  For  example,  the  mean  of  the  absolute 
percentage  errors,  which  for  counties  A  through  D  is  10.09 
percent,  is  computed  without  reference  to  the  State  total:  each 
county's  actual  population  is  the  denominator  for  the  county's 
percentage  error  and  the  mean  of  these  errors  is  referenced  to 
the  total  number  of  counties.  Since  reference  to  the  State  total 
is  lacking,  the  mean  of  the  absolute  percentage  errors  can  not  be 
interpreted  in  terms  of  the  total  allocation  error  experienced  by 
the  entire  State:  one  can  not  say  that  10.09  percent  of  the 
State's  total  population  is  misallocated;  neither  can  one  say  that 
by  redistributing,  on  the  average,  10.09  percent  of  each 
county's  population  that  zero  misallocation  will  result.  This 
mean— like  mean  square  error— is  simply  not  interpretable  in 
terms  of  misallocation.  There  are  other  possible  interpretations 
of  misallocation.^  However,  IM  is  desirable  because  its  interpre- 
tation is  so  straightfoward:  it  gives  the  percent  of  the  State's 
total  population  that  must  be  reallocated  in  order  to  achieve 
zero  misallocation. 

To  the  extent  that  revenue  sharing  funds  are  directly  related 
to  county  population  shares,  the  index  is  a  proxy  for  the 
misallocation  of  funds.  Note,  also,  that  IM  can  be  directly 
applied  to  funding  allocations  by  measuring  funds  allocated 
using  estimated  populations  in  conjunction  with  funds  allocated 
using  actual  populations.  This  direct  evaluation  of  funding 
misallocation  would  be  especially  appropriate  for  the  Federal 
revenue-sharing  areas,  which  may  be  only  bound  by  or  depend 
upon  a  population  estimate  as  a  component  of  the  revenue- 


P  .     =     (P  .,     )  X  P 
ei  ei 

(XP  .,) 


P  .,  =  unadjusted  county  i  estimate 
ei 


^In  a  personal  communication  from  Fred  Cavanaugh  of  the  U.S. 
Bureau  of  the  Census,  other  interpretations  of  accuracy  described  in 
internal  Bureau  memoranda  were  described.  Rati!  et  ai.  (1974)  describe 
an  alternative  summary  index  of  allocation  error,  but  its  interpretation  is 
not  as  straightforward  as  that  available  for  IM. 
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sharing  fund  allocation.  With  the  concept  of  misallocation 
defined  and  a  summary  index  for  it,  the  allocation  accuracy  of 
the  two  models  given  earlier  can  be  evaluated. 

TEST  RESULTS  AND  DISCUSSION 

Using  the  models  given  in  table  1 ,  estimates  of  county  school 
enrollments  in  grades  1  through  8,  adjusted  to  the  actual 
Washington  State  totals,  were  generated  for  each  year  1971  to 
1977  and  compared  with  the  actual  reported  enrollments.  In 
addition,  the  1970  end-point  of  the  two  observation  years  used 
to  build  the  models  was  also  evaluated  in  terms  of  comparing  its 
adjusted  estimated  county  enrollments  to  the  actual  county 
enrollments."  Using  the  Index  of  Misallocation,  the  annual  level 
of  allocation  accuracy  of  the  estimates  is  presented  in  table  2. 

Table  2.  Allocation  Accuracy  of  the  Two  Regression 
Models 


Index  of  misallocation' 

Year 

Ratio- 
correlation 

Difference- 
correlation 

1970^  

1971          

2.73 
0.72 
0.10 
1.79 
2.24 
2.53 
2.76 
3.26 

2.90 
1.76 

1972         

3.43 

1973            

4.07 

1974         

3.88 

1975         

3.78 

1976            

3.60 

1977 

4.08 

'  Computed  from  estimates  adjusted  to  the  actual  state  total. 

^  End  point  of  the  two  observation  years  used  in  model  construction. 

While  both  models  exhibit  generally  low  levels  of  misalloca- 
tion, the  summary  level  of  misallocation  is  lower  for  the 
ratio-correlation  generated  estimates  than  for  those  derived 
from  the  difference-correlation  model  for  each  year.  This 
evidence  strongly  supports  an  argument  favoring  the  choice  of 
the  ratio-correlation  over  the  difference-correlation  model  for 
estimating  the  county  "population"  selected  for  this  test  in 
Washington.  Note  that  the  Index  of  Misallocation  is  lower  for 
the  ratio-correlation  model  in  1970.  This  1970  information 
presents  a  different  perspective  on  the  suitability  of  the  two 
regression  models  than  do  the  Coefficient  of  Determination  and 
Standard  Error  of  Estimate.  The  Coefficient  of  Determination 
and  Standard  Error  of  Estimate  indicate  the  superiority  of  the 
difference-correlation  model;  although  the  IM  values  are  only 
slightly  different,  the  Index  of  Misallocation  for  1970  indicates 


*  It  would  also  have  been  just  as  easy  to  transform  the  estimated 
change  in  enrollment  proportions  between  1960  and  1970  into  the  1960 
allocation  of  enrollment  by  county.  Consequently,  an  IM  value  for  1960 
instead  of  1970  could  have  been  calculated. 


that  the  ratio-correlation  model  is  superior  in  its  allocation 
accuracy:  obviously,  these  two  traditional  criteria,  R^  and 
S.E.E.,  do  not  directly  measure  allocation  accuracy. 

In  the  example  just  presented,  the  difference-correlation 
model  would  likely  have  been  selected  over  the  ratio-correlation 
model  if  only  these  traditional  first-step  criteria  of  model 
adequacy  had  been  used  and  a  second-step  evaluation  had  not 
been  undertaken.  This  point  is  worth  some  discussion.  In  some 
situations,  the  only  evidence  available  for  selecting  a  candidate 
model  may  be  the  first-step  evaluation  criteria.  There  are 
situations  where  it  is  not  possible  to  conduct  both  steps  of 
evaluation.  Had  this  been  the  case  for  the  example  just 
presented,  the  difference-correlation  model  would  probably 
have  been- selected  over  the  ratio-correlation  model  if  only  the 
traditional  criteria  had  been  used.  If  this  had  been  the  case,  it 
would  have  resulted  in  uniformly  higher  misallocation  error  for 
each  of  the  years  under  examination,  1971  to  1977.  However, 
by  introducing  the  Index  of  Misallocation  directly  as  a  first-step 
model  evaluation  criterion  and  measuring  the  misallocation  level 
of  the  1970  estimates,  the  ratio  correlation  model  is  identified 
as  superior  in  its  allocation  accuracy.  This  identification,  as  the 
example  clearly  shows,  is  not  made  by  the  traditional  first-step 
criteria  of  R'^,  S.E.E.,  and  the  like.  The  subsequent  comparisons 
of  allocation  accuracy  for  the  two  models  support  the  utility  of 
applying  IM  as  a  first-step  model  evaluation  criterion. 

Although  the  concept  of  allocation  accuracy  may  have  more 
general  applicability,  it  raises,  as  mentioned  earlier,  issues 
specific  to  the  ratio-correlation  and  difference-correlation  forms 
of  multiple  regression.  These  issues  deserve  additional  attention. 
Both  of  these  forms  have  been  developed  within  and  continue 
to  be  used  solely  within  the  field  of  population  estimation. 
They  are  not  commonly  found  within  applied  areas  that  may  be 
more  familiar  to  statisticians.  As  a  result  of  this  insulation  from 
the  greater  community  of  statisticians,  there  remain  two  subtle, 
and  until  now,  unanlyzed  characteristics  that  create  distortions 
that  contribute  to  the  inability  of  traditional  measures  of 
"goodness  of  fit"  to  capture  the  implicit  intent  of  these 
regression  forms:  accurate  allocation. 

The  more  general  characteristic  is  that  these  forms  rely  on 
transformed  data  to  effect  the  allocation  of  N  items  among  K 
categories,  where  N  is  determined  independently  of  the  regres- 
sion model.  The  transformations  produce  a  dependent  variable 
in  the  model  that  is  the  temporal  change  in  a  proportion.  This 
transformation  is  at  once  useful  because  it  controls  for 
differential  population  size  and  yet  allows  for  the  algebraic 
manipulation  of  a  model's  estimates  of  these  temporal  changes 
into  actual  population  numbers.  This  is,  indeed,  a  powerful  and 
flexible  feature  of  both  of  these  regression  forms.  However,  at 
the  same  time,  it  does  not  allow  a  direct  evaluation  of  allocation 
accuracy  since  the  dependent  variable  in  the  model  must  be 
algebraically  manipulated  and  controlled  to  an  independently 
determined  State  total  to  produce  the  final,  usable  result— a  set 
of  county  populations.  Consequently,  the  summary  statistics 
such  as  R^  and  S.E.E.  suffer  from  the  same  problem  outlined 
earlier  for  the   mean  of  absolute  percentage  errors:   they  are 
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computed  without  reference  to  a  State  total  and,  consequently, 
do  not  offer  an  evaluation  of  allocation  accuracy.  Without  a 
measure  like  IM,  an  evaluation  of  the  adequacy  of  these  two 
regression  forms  is  incomplete:  They  are  used  to  generate 
temporal  changes  in  proportions  that  are  algebraically  manipu- 
lated into  population  allocations  while  at  the  same  time  no 
assessment  of  allocation  accuracy  is  possible  within  the 
regression  forms. 

The  second  distorting  characteristic,  subsumed  within  the 
first  but  a  distinct  issue,  is  that  the  historical  pattern  of  the 
distribution  of  a  given  State's  population  among  its  counties 
ultimately  shapes  the  allocation  generated  from  a  given  regres- 
sion model.  In  Washington  for  example,  about  30  percent  of  the 
State's  total  population  resides  in  only  one  of  its  39  counties. 
Since  this  pattern  of  dominance  persists  over  time,  the  specific 
allocations  are  always  shaped  by  it  because  the  most  recent 
county  population  census  number  is  used  in  the  algebraic 
manipulation  resulting  in  a  current  population  estimate.  Since 
this  historical  pattern  is  external  to  the  regression  model,  again, 
the  suitability  of  a  given  model  is  never  fully  captured  by 
traditional  measures  of  fit  such  as  R^  and  S.E.E.  While  the 
pattern  of  county  population  distributions  that  exists  in 
Washington  may  not  be  the  same  as  those  found  in  other  States, 
the  fact  remains  that  in  any  State  its  historical  pattern  will  never 
be  reflected  in  the  traditional  model  evaluation  criteria. 
Consequently,  a  measure  like  IM  is,  again,  required  in  order  to 
evaluate  the  "allocation  fit"  of  a  given  model. 

The  example  results  strongly  support  the  argument  that  IM 
should  be  used  as  a  criterion  of  model  "fit"  in  the  first  step  of 
model  evaluation.  The  finding  that  the  IM  value  in  the  first  step 
indicated  the  subsequent  allocation  accuracy  differentials  found 
in  the  second  step  of  evaluation  may  be  useful  in  situations 
where  second-step  evaluations  are  not  possible  or  for  some 
reason  they  are  not  used.  However,  IM  should  not  be  used  as  a 
sole  criterion  for  evaluating  models  using  the  ratio-correlation  or 
difference-correlation  forms.  The  instability  of  variable  relation- 
ships over  time  is  potentially  so  strong  that  an  estimation 
procedure  must  be  prepared  to  utilize  external  information  (see, 
e.g.,  Swanson,  1978). 

Because  of  this  instability  and  other  problems,  IM  cannot  be 
expected  to  quarantee  the  adequacy  of  subsequent  estimates 
using  a  model  it  favors,  although  it  does  accomplish  this  with 
the  data  presented  here.  Additional  examinations  in  other  States 
using  the  full  two-step  evaluation  procedure  are  needed. 
Another  cautionary  note  is  that  IM  is  intended  to  be  a  summary 
measure  of  allocation  error.  It  does  not  indicate  where  the 
allocation  error  occurs  relative  to  individual  cases.  In  many 
situations,  it  is  important  to  take  into  account  individual  county 
errors.  Selection  of  any  model  or  procedure  requires  some 
degree  of  judgment,  and  evidence  based  on  a  county-by-county 
examination  is  often  useful. 

SUMMARY 

This  report  argues  that  allocation  accuracy  is  an  important 
dimension  of  population  estimation  accuracy  that  should  be 
routinely  applied  to  evaluations  involving  population  estimates 
since  many  of  these  estimates  are  used  for  funding  allocations. 


This  perspective  is,  of  course,  not  of  particular  interest  to  an 
individual  county  or  other  revenue  receiving  subunit;  it  is  of 
interest  when  taken  over  a  logical  set  of  counties  or  other 
subunits.  The  report  describes  a  summary  measure  of  allocation 
accuracy,  the  Index  of  Misallocation,  which  is  designed  to 
capture  this  dimension  of  accuracy  and  give  it  a  clear,  concise 
meaning.  Given  the  millions  of  Federal  and  State  dollars  that  are 
annually  allocated  using  population  estimates^— and  the  intent 
of  legislation  regarding  the  accurate  allocation  of  these  funds-it 
is  remarkable  that  this  concept  has  not  been  routinely  used. 
This  absence  is  especially  noticeable  in  conjunction  with 
population  estimates  derived  from  the  two  regression  forms 
discussed  in  the  report,  which  are  used  solely  for  estimation  of 
population,  and  for  which  the  concept  of  allocation  "goodness 
of  fit"  is  completely  overlooked  in  standard  manuals  and  the 
literature.  The  Index  of  Misallocation  should  be  used  as  one  of 
the  criteria  of  model  adequacy  for  these  regression  forms  and, 
further,  as  a  criterion  of  "estimation"  fit  in  the  second  step  of 
model  evaluation  usually  undertaken  in  studies  of  estimation 
accuracy.  Like  existing  criteria,  it  should  not  be  used  to  the 
exclusion  of  other  information. 

In  view  of  the  approach  of  the  availability  of  data  from  the 
1980  Census  of  Population  and  the  opportunity  to  test  and 
refine  regression  and  other  types  of  estimation  methods,  the 
discussion  of  allocation  accuracy  is  timely.  The  summary  index 
of  allocation  error  described  in  the  report  can  be  easily  included 
in  any  evaluation.  Its  inclusion  is  important  because  it  offers  an 
assessment  of  allocation  accuracy,  the  fiscal  implications  of 
which  can  be  immediately  grasped  by  administrators,  policy- 
makers, and  other  nontechnical  personnel  at  the  Federal,  State, 
and  local  levels.  This  potential  area  of  use  of  the  concept  of 
allocation  accuracy  provides  the  index  described  here  with 
advantages  over  other  possible  indices  of  allocation  accuracy 
because  of  its  clear  interpretation  and  its  long  history  of  use  in 
other  areas  (For  recent  discussions  see  Cortese  et  al.,  1976  and 
1978;  and  Massey,  1978). 
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Appendix 


The    ratio-correlation    and    difference-correlation    forms   of 
multiple  regression  are  defined  as: 

Y   =   a^    -H   aix,    +   ajXj    +'■•■.+ a^^k  "^  ^ 

Where  (1 )        aj  =  coefficient  to  be  estimated 
E=  error  term 

(2.a)      In  the  ratio-correlation  form: 

Y  =  (Population  in  County,  time  =  ^Q+y)  "^  (Population  in  County,  time  =  Tq)  =  Yx 
State  Total  Population  State  Total  Population  Yo 

Symptomatic  Symptomatic 

Xj  =  (Indicator  in  County,  time  =  Tq+^)  -^  (Indicator  in  County,  time  =  Tq)  =  Xj^^ 
State  Total  Indicator  State  Total  Indicator  Xjg 


(2.b)      In  the  difference-correlation  form: 
Y  =  Y^  -  Y„ 


^i  ~  ^ix  ~  '^^io 


Discussant 

Howard  N.  Martin 

Director  of  Research 

Houston  Chamber  of  Commerce 


The  two  authors  we  have  heard  today  are  to  be  congratulated 
on  several  points.  Frederick  Cavanaugh  has  presented  a  concise 
overview  of  the  Census  Bureau's  plans  for  tests  of  methods,  and 
David  Swanson  has  raised  some  novel  points  concerning  criteria 
for  choosiny  estimating  methods.  Both  papers  are  clearly 
written  and  well-documented— features  every  reader  appreciates. 

My  overall  observation  is  that  both  papers  provide  thorough 
and  stimulating  treatment  of  their  topics.  I  am  generally 
sympathetic  with  the  contents  of  both,  and  as  I  read  this 
material,  I  found  myself  registering  agreement  more  often  than 
disagreement. 


for  application  to  a  rather  long  list  of  activities  relating  to 
local  areas. 

There  is  no  question  that  the  intercensal  population  esti- 
mates carry  a  great  deal  of  weight  in  many  types  of  decisions.  In 
view  of  the  widespread  impact  that  these  estimates  have,  an 
intensive  program  to  test  estimating  methods  is  indeed  welcome. 

The  Census  Bureau  has  a  long  history  of  testing  its  estimating 
methods  against  the  decennial  benchmarks  provided  by  the 
population  and  housing  censuses.  The  growing  ramifications  of 
the  intercensal  estimates  make  these  tests— and  the  dissemination 
of  their  results— more  important  now  than  ever  before. 


SIGNIFICANCE  AND  USES  OF  POPULATION 
ESTIMATES 

Both  papers  we  have  heard  this  morning  involve  allocation 
of  funds,  whether  through  Federal  revenue  sharing  or  through 
State  and  other  distributions.  I  certainly  agree  that  population 
estimates  for  such  purposes  are  extremely  important,  but  I 
would  like  to  point  out  that  these  estimates  also  are  put  to  a 
wide  variety  of  other  uses,  both  in  the  private  sector  and  in  the 
public  sector. 

For  example;  Edward  Spar,  our  session  chairman  this  morn- 
ing, annually  prepares  population  estimates  for  all  counties  in 
the  Nation,  and  his  numbers  are  used  by  analysts  concerned 
with  comparative  market  evaluation.  His  data  depend  in  part  on 
the  Census  Bureau's  mid-year  revenue  sharing  estimates.  This  is 
just  one  example  of  an  application  other  than  revenue  sharing 
which    has    important    consequences    for    different    localities. 

Also,  population  estimates  are  especially  important  for  many 
business  decisions  about  where  to  locate,  and  thus  affect  the 
economies  of  the  Nation's  regions,  because  new  businesses  mean 
new  jobs— and  loss  of  businesses  means  loss  of  jobs.  Manu- 
facturers of  many  different  kinds  of  consumer  goods,  for 
instance,  want  to  locate  their  plants  in  areas  which  are  large 
enough  to  provide  good  markets  for  their  products. 

In  addition  to  statewide  estimates,  population  estimates  are 
made  by  local  groups— generally  on  a  regional,  county,  or  sub- 
county  basis— both  in  the  private  sector  and  in  the  public  sector. 


MIGRATION 

One  of  the  key  problems  I  have  encountered  in  estimating 
population  arises  from  migration.  I  hope  that  special  attention 
is  being  given,  in  the  tests  of  methods,  to  this  component  of 
population  change. 

I  have  observed  that  change  in  elementary  school  enrollment 
over  a  period  of  years  is  not  so  sensitive  as  desired  in  tracking 
inmigration  in  very  dynamic  metropolitan  areas— areas  with 
numerous  job  opportunities  attracting  primarily  young 
people.  .  .young  singles  as  well  as  young  couples  with  no 
children  or  with  children  below  elementary  school  age.  In  a 
metropolitan  area  with  which  I  have  direct  experience,  house- 
holds with  children  of  elementary  school  age  are  a  smaller  share 
of  migrants  than  of  nonmigrants.  I  suspect,  therefore,  that  as 
a  result  of  the  use  of  school  enrollment  in  preparing  estimates, 
dynamic  areas  with  large  net  migration  gains  may  be  systemat- 
ically underestimated,  while  areas  with  significant  migration 
losses  are  systematically  overestimated.  Presumably,  this  bias 
will    be    an    important   focus    of   the   1980  tests  of  methods. 

ADMINISTRATIVE  RECORDS 

Mr.  Cavanaugh  mentioned  the  use  of  administrative  records 
several  times.  While  the  use  of  the  administrative  records  method 
has  shown  mixed  results,  there  is  no  question  about  its  potential 
as  a  valuable  addition  to  the  list  of  standard  estimating 
techniques. 
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As  an  aside,  I  would  like  to  propose  that  the  Census  Bureau 
give  serious  consideration  to  making  aggregate  data  from 
administrative  records  available  to  other  organizations  which  are 
involved  in  population  estimates.  At  present,  access  to  certain 
kinds  of  administrative  records  is  restricted  largely  to  the 
Bureau  itself;  yet  many  analysts  outside  the  Bureau  would  find 
such  data  useful  in  the  work  they  do. 

A  point  which  I  confess  puzzles  me  is  Mr.  Cavanaugh's 
reference  to  the  use  of  the  Federal  income  tax  return  and  Social 
Security  data  in  preparing  estimates  of  the  racial  composition 
of  metropolitan  areas.  Since  the  testing  of  a  method  or  methods 
for  preparing  estimates  for  this  purpose  has  been  placed  in  the 
highest  priority  level  for  the  1980  tests  of  methods,  it  is  assumed 
that  Census  staff  must  have  discussed  racial  estimating  pro- 
cedures extensively.  Nearly  all  of  us  around  the  country  who 
have  responsibility  for  making  population  estimates  frequently 
are  asked  questions  about  the  racial  composition  of  metropolitan 
areas,  and  the  development  of  methods  for  arriving  at  such  esti- 
mates would  indeed  be  most  welcome. 

HOUSING  UNIT  METHOD 

In  reviewing  the  accumulated  experience  from  previous 
tests  of  methods,  Mr.  Cavanaugh  noted  that  "improvements  in 
the  estimation  system  occur  when  smaller  or  lower  levels  of 
geography  are  controlled  to  higher  levels  of  geography."  While 
this  statement  may  hold  true  most  of  the  time,  I  suspect  there 
are  a  few  exceptions.  For  example,  when  counties  in  a  State 
are  controlled  to  a  State  total,  and  when  a  handful  of  fast- 
growing  counties  are  being  underestimated  by  one  of  the  stand- 
ard estimating  method,  those  counties  may  have  more  inaccurate 
estimates  when  controlled  to  the  State  total  than  they  would 
without  such  control. 

The  primary  level  at  which  I  wonder  about  that  statement 
by  Mr.  Cavanaugh,  though,  involves  the  use  of  the  housing  unit 
method  to  prepare  census  tract  estimates  of  population.  In  a 
few  parts  of  the  Nation— Houston  among  them— data  from  the 
electric  utility  companies  provide  an  accurate  count  of  new 
units  put  in  place. 

In  Houston,  we  use  new  residential  electrical  connections  to 
tabulate  additions  to  the  housing  stock  by  census  tract.  (A  new 
residential  electrical  connection  is  defined  as  the  first  time  a 
meter  is  put  in  place  to  service  a  new  housing  unit  with  elec- 
tricity.) New  residential  electrical  connections  distinguish 
between  single-family  and  multi-family  housing,  providing 
an  excellent  continuing  gross  inventory  of  housing. 

Since  completions  (plus  mobile  homes)  are  accomplished 
fact,  one  source  of  error  in  the  more  normal  procedure,  which 
uses  building  permits,  is  removed.  When  losses  from  housing 
inventory  can  be  tracked  or  reasonably  estimated,  and  when 
adjustments  are  made  for  declining  household  size  and  for 
vacancies,  the  housing  unit  method  may  offer  acceptably 
accurate  estimates  for  small  subcounty  areas. 

We  will  compare  estimates  from  the  housing  unit  method  for 
census  tracts  in  the  Houston  area  with  the  final  1980  census 
data  to  get  a  gauge  of  how  well  this  procedure— which  does  not 
control    tract    estimates   to  a  county   estimate— works.    In  the 


local  review  of  the  1980  census  data,  we  are  finding  a  high 
percentage  of  tracts  in  which  both  the  dwelling-unit  counts  anc 
the  population  counts  suggest  that  our  estimates  will  compare 
favorably  with  the  final  figures. 


WEIGHTING  ESTIMATES 

Mr.  Cavanaugh  mentioned  the  procedure  Eugene  P.  Ericksen 
has  proposed  to  optimize  weights  assigned  in  averaging  inde- 
pendent estimates.  The  current  procedure  of  using  unweighted 
averages  makes  the  implicit  assumption  that  none  of  the  methods 
used  is  a  better  estimator  than  any  other,  and  Ericksen  un- 
doubtedly is  correct  in  challenging  that  notion.  The  alternative 
of  weighting  the  averages,  however,  makes  a  different  implicit 
assumption— that  historical  patterns  will  continue  (meaning  that 
the  weights  which  offered  the  most  accurate  estimates  in  the 
past  will  offer  the  most  accurate  estimates  in  the  future).  At 
intervals  of  a  year  or  so,  this  assumption  probably  is  reasonable; 
but  if  we  look  at  decade  intervals,  it  may  be  subject  to  serious 
question. 

Presumably,  what  would  be  done  is  to  derive  weights  which 
offered  the  best  estimates  for  1970-1980,  and  then  use  these 
during  1980-1990.  An  appropriate  test  would  be  to  derive 
weights  for  1960-1970,  and  then  to  apply  them  to  1970-1980 
to  see  how  well  the  relationships  held.  For  areas  of  the  Nation 
which  experienced  rapid  economic  growth  or  decline,  the  1960- 
1970  weights  may  not  be  very  good  predictors  for  1970-1980. 

Another  important  concern  involves  the  levels  of  geography 
for  which  separate  weights  might  be  prepared.  Certainly,  it 
might  be  highly  desirable  to  have  local  weights,  since  economic 
and  demographic  characteristics  of  State  and  sub-State  areas 
varied  in  the  1970's. 


CENSUS UNDERCOUNT 

In  the  introduction  to  his  paper,  Mr.  Cavanaugh  referred  to 
the  need  for  a  reliable  benchmark  for  testing  methods,  and  he 
mentioned  the  possible  use  of  the  results  of  those  tests  in  modi- 
fying and  retooling  the  methods.  He  expressed  confidence  that 
the  1980  census  will  prove  to  be  an  adequate  benchmark  for 
the  full-scale  tests  of  methods.  Here  I  would  like  to  record  a 
difference  of  opinion. 

In  February  1980,  I  attended  a  Census  Bureau-sponsored 
conference  on  the  census  undercount— and  what  to  do  about 
it— in  Washington.  At  this  conference,  there  was  unanimous 
concern  about  the  undercount,  and  widespread  dissent  on 
how  to  adjust  for  undercounts— if  any  adjustment  is  made  at  all. 

This  concern  about  undercounts  reflects  common  knowledge 
that  the  census  is  not  absolutely  accurate— and  we  all  know  that 
there  are  many  ways  for  underenumeration  to  occur.  We  have 
a  tendency  to  accept  census  counts  as  a  matter  of  convention- 
pretending,  as  it  were,  that  they  are  entirely  on  target. 

If  it  is  decided  to  adjust  for  undercount  in  the  1980  census, 
how  will  this  adjustment  affect  the  tests  of  methods?  The  two 
benchmarks-the  1970  and  1980  census-will  not  offer  com- 
parable numbers,  and  the  decision  to  adjust  would  acknowledge 
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that   the   "raw"  counts  are  inaccurate.  Can  one  reliably   test 
against  an  uncertain  standard? 

Assume  that  the  unadjusted  census  population  count  for  a 
county  is  1  million  people,  and  that  the  tests  of  methods 
revealed  that  the  estimating  methods  being  used  for  this  county 
yielded  results  close  to  the  census  count.  Also  assume  that  2 
years  later  the  count  for  the  county  is  adjusted  to  1,200,000. 
What  happens  to  earlier  observations  and  conclusions  relating  to 
the  accuracy  of  the  methods?  Will  the  methods  be  adjusted  to 
the  new  benchmark? 


DISSEMINATING  TEST  RESULTS 

A  welcome  comment  in  Mr.  Cavanaugh's  presentation 
involves  the  Census  Bureau's  plans  for  widespread  distribution 
of  the  test  results.  Many  users  of  the  Bureau's  population 
estimates— especially  those  in  the  private  sector  and  those  in 
local  governments  who  are  relying  increasingly  on  such  data— 
will  appreciate  an  intensified  effort  to  circulate  the  results 
among  the  maximum  number  of  audiences.  The  Bureau  is  to 
be  commended  for  this  approach. 


REVENUE  SHARING  LAWSUITS 

It  occurs  to  me  that,  from  one  perspective,  an  argument 
could  be  made  that  testing  of  methods  could  have  undesirable 
consequences— though  I  am  not  making  any  suggestion  whatso- 
ever that  the  tests  be  abandoned. 

Suppose  that  the  1980  tests  of  methods  reveal  that  the  popu- 
lation estimates  prepared  for  a  county  during  the  1970's  were 
underestimates.  This  would  mean  that  each  year  in  the  1970's 
this  county  received  less  in  Federal  grant  funds  than  it  should 
have  received— to  name  only  one  type  in  a  series  of  shortages 
experienced  by  the  county   because  of  the  estimation  errors. 

Then  this  question  is  suggested:  What  recourse  will  the 
county  have  if  the  1980  census  tests  of  methods  clearly  indi- 
cate that  the  Census  Bureau's  population  estimates  for  the 
county  were  too  low?  Will  we  find  State  and  local  governments 
filing  suit  to  recover  funds  which  they  were  denied  through 
estimating  error? 

ADJUSTING  THE  ESTIMATES  IN  THE  1980's 

One  possible  consequence  of  the  tests  of  methods,  using  the 
1980  census  data,  is  the  capability  to  produce  "adjustment 
factors"  for  localities  so  that  estimates  from  each  method  used 
in  the  1980's  could  be  altered  to  allow  for  the  kinds  of  errors 
which  occurred  in  the  1970's. 

What  I  am  suggesting  here  is  that,  just  as  it  may  be  possible 
to  adjust  for  census  undercount,  it  also  might  be  possible  to 
adjust  for  subsequent  Census  Bureau  underestimates  or 
overestimates. 

The  question  I  would  like  to  raise  at  this  time  is  this:  Has 
there  been  any  discussion  about  adjusting  Census  Bureau 
population  estimates  for  estimation  error  as  determined  from 
the  1 980  census  tests  of  methods? 

LOCAL  REVIEW  RESULTS 

Another  question  I  would  like  to  address  to  Mr.  Cavanaugh 
involves  the  local  review  data  from  the  1980  census.  Now  that 
the  preliminary  counts  are  almost  complete,  do  you  have  any 
feeling  for  how  accurate  the  Census  Bureau's  intercensal  popu- 
lation estimates  have  been? 


THE  PROPOSED  INDEX  OF  MISALLOCATION 

Before  beginning  a  discussion  of  David  Swanson's  paper,  I 
would  like  to  state  that  I  have  not  had  any  experience  in  the 
actual  use  of  an  Index  of  Misallocation.  My  comments  and 
questions,  therefore,  are  based  entirely  on  reading  Mr.  Swanson's 
paper,  and  I  trust  he  will  have  an  opportunity  to  respond  to 
any  item  included  in  my  remarks. 

I  agree  with  the  notion  of  applying  an  Index  of  Misallocation 
(IM)  to  population  estimates  used  for  fund  allocation,  but  I  also 
would  like  to  register  some  concerns  about  the  value  of  doing 
so. 

Perhaps  reading  into  Mr.  Swanson's  paper  something  that  is 
not  there,  I  get  the  impression  that  the  application  of  IM  between 
censuses  would  involve  determining  which  of  two  or  more 
competing  methods  produced  a  preliminary  figure  for  the  pre- 
ceding year  which  was  closest  to  the  revised  estimate  for  the 
same  year.  The  "best"  method,  by  this  criterion,  then  would  be 
used  for  the  subsequent  year. 

My  concerns  focus  on  areas  experiencing  large  population 
change  due  to  migration,  because  such  areas  are  the  ones  in 
which  estimating  errors  usually  are  greatest.  I  would  like  to 
suggest  these  problems: 

1.  Since  there  are  grave  doubts  about  the  accuracy  of  estimates 
for  areas  which  have  experienced  large  population  change, 
the  revised  estimates  from  previous  years  do  not  offer  a 
good  empirical  standard  against  which  to  gauge  the  accuracy 
of  the  allocation  procedure. 

2.  The  revised  estimates  may  not  always  be  more  accurate  than 
the  preliminary  estimates,  which  would  put  the  use  of  IM  in 
the  position  of  increasing  the  extent  of  misallocation. 

3.  If  the  population  estimates  contain  substantial  error  re- 
gardless of  whether  or  not  IM  is  used,  then  IM  offers  only 
marginal  improvements— if  any— in  comparison  with  the 
actual  population  distribution. 

If  the  foregoing  three  statements  are  acceptable,  then  a  likely 
conclusion  is  that  IM  may  not  offer  more  than  marginal  improve- 
ments in  allocation  equitability  because  it  would  work  best  in 
areas  with  relatively  little  population  change  (where  any 
improvements  would  be  marginal),  while  the  greater  the  propor- 
tional population  change,  the  greater  the  probable  errors  of 
estimate  in  any  event. 
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OTHER  QUESTIONS  ABOUT  IM 

Now,  let  us  turn  to  other  questions  relating  to  Mr.  Swanson's 
paper.  Admittedly,  some  of  the  questions  may  be  interrelated 
and  may  overlap.  IM  gives  a  measure  of  allocation  accuracy— but 
why  should  misallocation  represent  a  more  important  type  of 
accuracy  than,  say,  mean  square  error  (MSE)  or  mean  absolute 
percent  error  (MAE)?  Suppose  that  MSE  and  MAE  suggest 
one  model  while  IM  suggests  another.  In  this  case,  while  one 
might  argue  that  the  use  of  IM  is  preferred,  it  should  be  noted 
that  one  or  two  counties  could  be  subjected  to  particularly 
large  misallocations  in  order  to  accept  a  minimization  of  IM. 
In  such  a  case,  why  should  it  be  more  important  to  minimize 
IM  than  MSE  or  MAE? 

In  a  friendly,  diplomatic  manner,  I  frequently  have  expressed 
a  difference  of  opinion  in  talking  with  State  of  Texas  representa- 
tives in  Austin  who  make  statewide  population  estimates.  My 
impression  is  that  the  methodology  used  by  these  State  esti- 
mators is  more  generally  applicable  to  rural  counties  with 
relatively  slower  population  change  than  the  more  dynamic 
urban  areas.  Therefore,  as  I  read  Mr.  Swanson's  paper,  I  raised 
these  questions:  Is  IM  biased?  Does  it  introduce  "size 
dependence"  into  accuracy  measurements?  (That  is,  does  IM 
give  differential  weights  to  individual  counties,  depending  on 
their  relative  sizes?)  If  this  is  the  case,  how  wise  is  it  to  have  a 
measure  like  IM  that  is  size-dependent? 

The  last  question  I  wish  to  direct  to  Mr.  Swanson  involves 
model  selection.  In  the  data  he  uses  as  an  example,  he  shows 
how  one  of  the  1 960/1 950-based  models  (in  its  first  stage  of 


evaluation)  is  preferable  in  terms  of  its  lower  IM  value  while 
being  inferior  in  terms  of  its  traditional  evaluation  criteria 
(R^  and  SEE).  This  preferred  model  also  produces  subsequent 
estimates  that  have  lower  IM  values. 

However,  suppose  that  a  comparison  between  two  models 
revealed  that  model  A  had  a  superior  IM  value  in  its  first-stage 
evaluation  but  an  inferior  IM  value  in  its  second-stage  evaluation. 
Suppose  that  we  now  have  obtained  1980  data  and  can  use  this 
new  information  to  generate  update  versions  of  model  A  and 
model  B  for  post-1980  estimates.  Suppose  further  that,  once 
again,  model  A  has  a  lower  IM  value  in  its  first  stage.  The 
question  is:  Would  you  prefer  to  use  the  complete  but  dated 
results  of  the  earlier  first-  and  second-  stage  evaluations  and 
select  model  B,  or  would  you  ignore  the  earlier  results  and 
accept  the  new  version  of  model  A? 


CONCLUDING  COMMENTS 

This  concludes  my  prepared  remarks  in  relation  to  these 
two  very  fine  papers.  These  presentations  relate  to  matters  of 
great  significance  to  all  of  us. 

I  have  no  basic  quarrel  with  either  paper.  The  concerns 
and  questions  I  have  mentioned  may  be  due  either  to  lack  of 
information  on  my  part  or  to  my  misunderstanding. 

While  both  authors  suggest  areas  for  further  research  and 
improvement,  they  are  moving  in  the  right  direction,  and  I 
am  sure  the  audience  shares  my  appreciation  for  the  excellent 
work  they  are  doing. 
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With  the  forthcoming  release  of  the  1 980  census,  we  will  once 
again  have  timely  and  detailed  information  on  small  areas 
throughout  the  United  States.  While  many  users  of  local-area 
statistics  would  like  to  have  more  frequent  updates  of  these 
important  statistics,  the  pending  release  will,  at  a  minimum, 
provide  a  once-in-a-decade  opportunity  for  detailed  analysis  of 
trends  and  characteristics  of  local  areas  throughout  our  vast 
country. 

The  first  paper  in  this  session  concerns  the  new  definitions 
of  standard  metropolitan  areas.  As  the  chairperson  of  the 
Federal  Committee  on  Standard  Metropolitan  Statistical  Areas, 
it  is  obvious  that  I  have  a  special  interest  in  this  topic.  However, 
I  would  like  to  underscore  the  importance  of  the  new  structure 
which  is  to  be  introduced  as  a  result  of  these  new  criteria.  The 
new  structure  has  been  designed  to  be  more  flexible  than  earlier 
standards,  yet  at  the  same  time  it  was  designed  to  be  compatible 
with  the  earlier  definitions. 


During  the  past  decade,  the  use  of  the  SMSA  concept  has 
grown  significantly.  In  addition  to  its  traditional  purpose  as  a 
concept  for  standardizing  the  definitions  of  the  "greater  metro 
area"  around  our  major  communities,  it  is  now  used  in  de- 
termining qualifications  for  selected  Federal  programs  and  as  a 
major  reference  for  identifying  markets.  I  am  looking  forward 
to  the  discussion  today,  to  learn  of  your  reactions  to  the  latest 
statistical  standards  for  defining  important  metropolitan  areas. 

The  paper  by  Arnold  Reznek  and  Randall  Spoeri  of  the 
Census  Bureau  is  of  special  interest  since  it  outlines  a  strategy 
for  making  full  use  of  available  statistical  resources  in  metro- 
politan or  regional  analysis.  It  is  complemented  by  the  third 
paper,  by  John  Morawetz,  which  described  an  important  new 
data  resource  which  has  been  developed  by  the  private  sector. 
With  the  many  pressures  on  the  Federal  government  to  collect 
small  area  information,  it  is  of  special  significance  to  see  how 
local  administrative  records  and  private  data  collection  can  be 
used  to  aid  the  local  analyst. 
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Using  the  New  Metropolitan  Statistical  Area  Definitions 

Jonah  Otelsberg 

City  University  of  New  York 

and 

Joseph  W.  Duncan 
Office  of  Federal  Statistical  Policy  and  Standards 


BACKGROUND 

In  1982,  a  new  classification  of  nnetropolitan  statistical  areas 
will  be  established  based  on  new  standards  for  designating  and 
defining  such  areas  for  Federal  statistical  purposes.  The  new 
structure  places  greater  emphasis  on  consolidated  metropolitan 
statistical  areas,  the  building  blocks  for  which  will  be  labeled 
"primary  metropolitan  statistical  areas."  In  addition,  a  set  of 
metropolitan  statistical  areas,  which  are  relatively  freestanding, 
will  also  be  defined.  This  new,  more  flexible  classification  for 
metropolitan  statistical  areas  should  improve  analysis  and 
provide  greater  flexibility  in  the  presentation  of  metropolitan 
area  statistics.  This  paper  will  (1)  outline  the  new  structure  and 
the  process  for  implementing  the  new  standards;  (2)  illustrate 
the  use  of  the  new  classification  in  ranking  metropolitan  areas; 
and  (3)  discuss  some  of  the  implications  of  the  new 
classification. 

THE  NEW  STRUCTURE  FOR  METROPOLITAN 
STATISTICAL  AREAS 

Although  fundamental  concepts  of  a  metropolitan  area  will 
remain  unchanged  after  the  1980  census,  there  are  some 
important  changes  in  the  standards,  which  reflect  both  the 
evolution  of  the  metropolitan  structure  since  1970  and  the 
growing  use  of  metropolitan  area  data. 

The  chief  structural  change  is  the  replacement  of  a  single  set 
of  metropolitan  statistical  areas  with  three  sets.  One  set,  the 
"Primary  metropolitan  statistical  areas"  (PMSA's),  includes 
those  areas  that  are  most  closely  associated  with  their  metro- 
politan neighbors.  The  second  set,  the  "consolidated  metro- 
politan statistical  areas"  (CMSA's),  represents  groupings  of  these 
PMSA's  into  large  metropolitan  or  "megalopolitan"  complexes. 
The  third  set,  the  relatively  freestanding  "metropolitan  statisti- 
cal areas"  (MSA's),  includes  those  areas  which  are  not  closely 
associated  with  other  areas.  These  areas  are  typically  surrounded 
by  nonmetropolitan  counties.  The  familiar  term  SMSA— 
standard  metropolitan  statistical  area— will  be  replaced  with 
these  three  new  terms. 


Although  consolidated  areas  are  not  a  new  concept,  previous 
Federal  data  on  metropolitan  areas  have  paid  less  attention  to 
them  than  is  likely  to  occur  after  1980.  By  the  late  1950's,  the 
growth  of  metropolitan  areas  had  created  zones  of  continuous 
or  semicontinuous  metropolitan  development,  which  were 
labeled  "megalopolitan,"  after  publication  in  1961  of  Jean 
Gottmann's  book  Megalopolis.  The  term  was  not  always  used 
with  care.  In  one  view,  the  entire  area  from  Boston  to 
Washington  was  labeled  "Boswash"  and  was  described  as  a  single 
urbanized  area. 

Nevertheless,  many  regional  analysts  began  to  recognize  the 
greater  interdependence  of  some  closely  related  metropolitan 
areas.  Two  such  areas  (with  New  York  and  Chicago  at  their 
centers)  appeared  in  census  publications  for  1960  and  1970,  and 
a  need  began  to  arise  for  additional  statistics  on  consolidated 
metropolitan  areas. 

In  1975,  the  Office  of  Management  and  Budget  published 
criteria  and  definitions  for  "standard  consolidated  statistical 
areas. "^  They  were  defined  as  two  or  more  contiguous  SMSA's 
which  meet  the  following  criteria  of  size,  urban  character, 
integration,  and  contiguity  of  urbanized  areas: 

"1)   One  of  the  standard  metropolitan  statistical  areas  has 

a  population  of  at  least  one  million. 
"2)   At    least    75    percent    of    the    population    of    each 

standard  metropolitan  statistical  area  is  urban. 
"3)   The    sum    of    the    number    of    workers    commuting 
between    the   two    standard    metropolitan   statistical 
areas  is  equal  to: 

"a)  at  least  15  percent  of  the  employed  workers 
residing  in  the  smaller  standard  metropolitan 
statistical  area,  or 
"b)  at  least  10  percent  of  the  employed  workers 
residing  in  the  smaller  standard  metropolitan 
statistical  area,  and 


'  Executive    Office    of    the    President,    Office    of    Management    and 
Budqet,  Standard  Metropolitan  Statistical  Areas,  1975,  pp.  59-61 . 
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"i)  the  urbanized  area  of  a  central  city  of  one 
standard  metropolitan  statistical  area  is  con- 
tiguous with  the  urbanized  area  of  a  central 
city  of  the  other  standard  metropolitan 
statistical  area,  or 

"ii)  a  central  city  in  one  standard  metropolitan 
statistical  area  shares  the  same  urbanized 
area  with  a  central  city  in  the  other  standard 
metropolitan  statistical  area." 

To  provide  a  systematic  basis  for  arriving  at  standard 
definitions,  the  new  standards  incorporate  the  1975  SCSA 
criteria,  with  certain  changes,  into  those  that  determine  the 
overall  extent  of  the  "greater"  metropolitan  area.  The  result  is 
to  establish  "greater"  metropolitan  areas  that  typically  combine 
all  counties  included  in  a  single  urbanized  area,  plus  those 
counties  with  the  qualifying  level  of  commuting  to  the  central 
core.  The  use  of  the  SCSA  rules  also  combines  as  single 
"greater"  areas  certain  neighboring  MSA's  that  have  separate 
urbanized  areas  but  a  substantial  commuting  interchange. 

The  new  standards  have  altered  the  1975  SCSA  criteria  in 
two  respects.  Each  metropolitan  statistical  area  in  a  consoli- 
dated area  must  have  an  urban  percentage  of  at  least  60,  not  75 
as  formerly;  a  total  of  at  least  1  million  population  is  required 
for  the  entire  consolidated  area,  rather  than  for  at  least  one  of 
the  component  areas,  as  previously.  These  changes  will  increase 
somewhat  the  number  of  consolidated  areas  and  add  a  few 
additional  metropolitan  statistical  areas  to  existing  consolidated 
areas. 

The  primary  metropolitan  statistical  areas  can  be  considered 
as  components  of  the  consolidated  metropolitan  statistical 
areas.  These  areas,  however,  will  retain  their  separate  identities 
and  the  statistics  relating  to  them,  reflecting  the  fact  that  they 
constitute  important  metropolitan  entities  in  their  own  right. 

Another  significant  change  for  the  1980's  will  be  the 
introduction  of  four  levels  of  metropolitan  statistical  areas.  The 
Federal  Committee  on  Standard  Metropolitan  Statistical  Areas 
recognized  that  SMSA's  differ  widely  in  their  urban  character- 
istics. Areas  of  1  million  people  or  more  have  enough  special 
characteristics  to  justify  placing  them  in  a  separate  class,  to  be 
known  as  Level  A. 

A  second  level  was  recognized  on  the  basis  of  studies  showing 
that  metropolitan  centers  of  at  least  a  quarter  million  popula- 
tion typically  exhibit  a  wide  regional  influence,  evidenced  by 
large  banks,  many  regional  headquarters,  and  Sunday  newspaper 
circulation  covering  a  wide  tributary  area.  Level  B  includes 
those  metropolitan  statistical  areas  with  population  from 
250,000  to  1  million. 

Level  C  identifies  metropolitan  areas  with  from  100,000  to 
250,000  people.  The  remaining  SMSA's  below  100,000  will  be 
defined  as  Level  D  metropolitan  statistical  areas.  In  1979,  there 
were  37  SMSA's  with  a  census  population  of  less  than  100,000. 
In  1982,  when  the  1980  standards  are  applied,  no  primary 
metropolitan  statistical  area  is  projected  to  be  below  100,000 
(except  for  two  in  New  England,  where  the  cutoff  is  75,000), 
and  about  37  metropolitan  statistical  areas  will  have  less  than 
100,000   population.    Even    though    these    areas    are   relatively 


small,  and  their  metropolitan  characteristics  are  often  limited, 
many  of  them  are  important  marketing  centers  for  relatively 
sparsely  settled  rural  regions. 

The  identification  of  levels  of  metropolitan  areas  was  an 
outgrowth  of  the  Committee's  discussions  concerning  how  large 
an  urban  area  must  be  to  justify  classification  as  metropolitan. 
Urban  centers  with  a  population  as  small  as  50,000  had  been 
recognized  by  the  official  metropolitan  criteria  ever  since  1930. 
While  the  requirements  for  qualification  have  been  made  more 
stringent  in  the  new  standards,  the  Committee  decided  not  to 
change  to  a  higher  cutoff  level,  because  many  existing  areas 
would  be  disqualified.  Instead,  the  Committee  decided  to 
identify  four  levels  of  areas  and  to  adopt  the  100,000 
population  level  for  qualification  of  new  metropolitan  statistical 
areas. 

Furthermore,  the  Committee  believed  that  the  identification 
of  four  levels  would  emphasize  to  users  that  metropolitan  areas 
differ  in  character,  depending  on  the  population  size.  Hence,  the 
Committee  believed  that  it  would  be  useful  to  present  statistics 
for  metropolitan  areas  broken  down  by  population  size  cate- 
gories. The  levels  also  aid  users  to  limit  a  study  to  the  largest 
areas,  or  to  focus  on  smaller  or  middle-sized  areas. 

The  1980  standards  use  the  Census  Bureau's  urbanized  areas 
to  determine  those  areas  large  enough  to  qualify  for  recognition 
as  metropolitan  statistical  areas.  While  the  urbanized  area  and 
the  metropolitan  statistical  area  are  related  concepts,  the 
urbanized  area  has  a  more  limited  geographic  extent.  The 
urbanized  area  is  defined  to  include  the  physically  continuous 
built-up  area  around  each  large  city,  and  thus  consists  of  high 
and  medium  density  development  in  the  core  of  the  metropol- 
itan area.  The  metropolitan  statistical  area,  however,  is  geo- 
graphically larger  than  the  urbanized  area  because  it  includes 
discontinuous  urban  and  suburban  development  beyond  the 
urbanized  area  as  defined.  In  addition,  it  may  also  include  some 
open  country  areas  from  which  many  residents  commute  to 
work  in  the  central  city  or  its  immediate  suburbs. 

The  Federal  Committee  on  Standard  Metropolitan  Statistical 
Areas  decided  that  the  urbanized  area  provided  a  more  standard 
approach  for  qualifying  areas  for  metropolitan  designation  than 
the  old  criteria.  These  areas  are  defined  nationwide  at  the  time 
of  each  national  census.  Hence,  they  provide  a  more  consistent 
basis  for  identifying  concentrations  of  population  than  is 
possible  using  the  limits  of  incorporated  cities,  whose  bound- 
aries tend  to  vary  regionally  due  to  local  annexation  practices. 
The  standards  also  use  the  urbanized  area  as  the  basis  for 
determining  the  central  counties  to  which  commuting  from 
outlying  counties  is  measured. 

IMPLEMENTATION  OF  THE  NEW  STANDARDS 

Preliminary  data  from  the  1980  census  will  soon  begin  to  be 
released.  Initial  (not  published)  preliminary  counts  are  being 
made  available  for  review  by  local  officials.  After  some  revisions 
have  been  made,  the  Bureau  will  publish  preliminary  counts  a 
State  at  a  time.  By  the  end  of  this  year,  the  Census  Bureau  will 
prepare  its  final  National  and  State  population  counts  and  report 
them  to  the  President  by  January  1,  1981. 
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As  the  final  1980  census  population  counts  for  cities, 
counties,  and  urbanized  areas  become  available  from  the  Bureau 
of  the  Census,  the  new  standards^  will  be  used  to  designate  and 
define  new  areas  that  qualify  under  the  requirements  for 
population  size,  central  counties,  outlying  counties,  and  central 
cities.  Most  of  these  new  areas  will  have  an  urbanized  area  of  at 
least  50,000  population,  and  a  total  metropolitan  area 
population  of  at  least  100,000  (75,000  in  New  England).  If  the 
urbanized  area  contains  a  city  of  at  least  50,000  population,  the 
population  size  requirement  is  met,  regardless  of  the  total 
metropolitan  area  population.  Each  new  area  will  include  the 
county  containing  the  urbanized  area's  central  city,  and  other 
counties  (if  any),  at  least  half  of  whose  population  is  included 
within  the  urbanized  area.  Outlying  counties  will  be  included 
using  1980  data  on  population  density,  percentage  urban,  and 
other  criteria  of  metropolitan  character,  in  conjunction  with 
1970  census  commuting  data.  (Commuting  data  for  1980  will 
not  be  available  until  late  1981;  therefore,  the  1970  commuting 
data  will  be  used  as  an  initial  indication  of  social  and  economic 
integration  of  outlying  counties  with  the  central  counties.) 

The  standards  for  Standard  Consolidated  Statistical  Areas 
(SCSA's)  also  will  be  implemented  when  the  final  counts 
become  available.  Existing  SCSA's  will  be  expanded  and  new 
consolidated  areas  will  be  recognized  on  the  basis  of  1970 
commuting  and  1980  total  population  and  percentage  urban. 
All  such  changes  in  Standard  Consolidated  Statistical  Areas  will 
be  reflected  in  the  initial  publications  of  data  from  the  1980 
census. 

The  present  terminology  of  "Standard  Metropolitan 
Statistical  Areas"  and  "Standard  Consolidated  Statistical  Areas" 
will  be  retained  in  the  first  stage  of  the  standards 
implementation. 

The  second  stage  for  implementing  the  new  standards  will 
occur  after  the  commuting  data  from  the  1980  census  become 
available.  This  will  probably  occur  in  late  1981  or  early  1982. 
At  that  time,  the  Federal  Committee  on  Standard  Metropolitan 
Statistical  Areas  will  review  the  boundaries  of  all  areas  using  the 
new  standards.  In  some  cases,  counties  will  be  added  and,  in 
other  cases,  counties  will  be  deleted  based  on  their  level  of 
commuting  to  the  central  counties  of  the  area  and  their  degree 
of  metropolitan  character. 

At  this  stage  there  will  also  be  a  review  of  those  areas  which 
may  qualify  for  recognition  as  primary  metropolitan  statistical 
areas  within  consolidated  areas  of  1  million  population  or  more. 
Final  definition  of  those  areas  will  be  made  provided  there  is 
substantial  local  support  for  separate  recognition  for  Federal 
statistical  purposes.  Also,  at  this  time,  areas  will  be  classified 
into  levels  as  described  above.  (Level  A  will  be  defined  as  those 
areas  of  1  million  or  more  population;  Level  B  will  be  defined  as 
those  areas  with  250,000  to  1  million;  Level  C  will  be  defined 
for  those  areas  with  100,000  to  250,000,  and  Level  D  will  be 
those  areas  with  population  less  than  100,000.)  Also,  at  this 
time,    the    new    designations    "Metropolitan    Statistical    Area" 


^The  final  standards  for  establishing  metropolitan  statistical  areas 
following  tfie  1980  census  were  publishied  in  the  Federal  Register  for 
January  3,  1980,  Part  VI,  Vol.  45,  No.  2,  and  in  the  December  1979  issue 
of  Statistical  Reporter. 


(MSA),  "Primary  Metropolitan  Statistical  Area"  (PMSA),  and 
"Consolidated  Metropolitan  Statistical  Area"  (CMSA)  will  be 
introduced. 


IMPLICATIONS  OF  THE  NEW  SYSTEM 

This  new  structure  for  defining  metropolitan  areas  is  fully 
consistent  with  earlier  concepts  of  a  metropolitan  statistical 
area.  The  new  standards  continue  the  practice  of  defining  a 
single  concentration  (sometimes  with  multiple  centers)  of  dense 
urban  development  of  a  specified  size,  with  strong  internal 
commuting  ties  and  weak  ties  to  any  other  densely  developed 
areas.  The  purpose  of  the  official  metropolitan  areas  also 
remains  unchanged— namely,  that  of  permitting  accurate 
comparisons  of  important  urban  centers  across  the  country.  The 
merit  of  this  new  structure  is  that  it  has  more  flexibility  than 
previous  classifications  of  metropolitan  areas.  It  provides  for 
recognition  of  the  fact  that  significant  differences  occur  as  a 
result  of  population  size  and  economic  interrelationships  with 
adjoining  areas.  It  recognizes  areas  which  are  important 
components  (or  building  blocks)  for  larger  metropolitan  areas. 
It  also  emphasizes  the  large  metropolitan  complexes  which,  in 
fact,  have  become  significant  for  megapolitan  development.  The 
levels  provide  an  additional  refinement  which  is  consistent  with 
current  usage  by  a  number  of  marketing  organizations.  Overall, 
the  new  classification  seeks  to  discourage  a  simplistic  view  of 
metropolitan  areas. 

Contrary  to  the  policy  of  the  Office  of  Federal  Statistical 
Policy  and  Standards,  various  Federal  programs  during  the 
1970's  increasingly  used  the  standard  metropolitan  statistical 
area  for  nonstatistical  program  purposes,  namely  in  the 
allocation  of  Federal  benefits.  In  turn,  areas  became  interested 
in  metropolitan  area  designation  because  it  affected  their 
eligibility  to  participate  in  Federal  funding  programs. 

Examples  of  program  uses  of  SMSA's  include  the  Community 
Development  Block  Grant  (CDBG)  program  and  the  Medicare 
and  Medicaid  programs.  Under  the  CDBG  program,  the 
Department  of  Housing  and  Urban  Development  is  required  by 
statute  to  allocate  80  percent  of  the  funds  to  SMSA's,  and  20 
percent  to  nonmetropolitan  areas.  Therefore,  as  the  number  of 
SMSA's  and  counties  within  SMSA's  increase,  the  competition 
for  the  metropolitan  share  of  the  funds  increases.  Under  the 
Medicare  and  Medicaid  programs  in  many  States,  reimburse- 
ments to  hospitals  for  routine  costs  are  partially  determined  by 
whether  a  hospital  is  located  in  a  county  within  an  SMSA  or  in  a 
nonmetropolitan  area.  The  result  is  that  limits  on  reimburse- 
ments tend  to  be  higher  for  hospitals  in  SMSA's,  reflecting  the 
higher  cost  of  routine  care  in  metropolitan  areas.  These  are  just 
two  examples  of  the  over  25  programs  which  currently  use 
SMSA's  in  their  eligibility  requirements. 

Consequently,  when  the  criteria  for  establishing  metropol- 
itan statistical  areas  were  being  revised,  there  was  much  concern 
at  the  State  and  local  government  level  about  the  possible 
effects  on  funding  allocations.  In  response  to  this  concern,  the 
Federal  Committee  on  SMSA's  has  proposed  (and  most  agencies 
have  agreed)  that  statistical  redefinitions  should  not  be  used  to 
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affect  program  allocations.  For  a  5-year  period  (beginning  in 
1979),  Federal  agencies  have  agreed  to  use  the  earlier  SMSA 
definitions  where  they  were  critical  for  funding  qualification. 
An  exception  to  this  5-year  "hold  harmless"  provision  is  being 
made  when  the  Congress  determines,  during  reauthorization, 
that  some  other  standards  should  be  utilized.  The  new  classifica- 
tion, with  its  levels  and  new  terminology,  should  encourage  the 
Congress  and  the  Federal  agencies  to  examine  their  uses  of 
metropolitan  areas  for  funding  purposes,  and  hopefully  develop 
alternative  methods  of  allocating  funds. 

During  the  1970's,  the  standard  metropolitan  statistical  area 
gained  wide  acceptance  in  marketing  and  advertising 
applications.  For  example,  chain  retail  outlets  frequently 
emphasize  market  location  based  upon  an  area's  population 
ranking  within  the  SMSA  statistics.  During  the  1980's,  this 
application  will  be  more  difficult  because  there  will  be  three  sets 
of  metropolitan  statistical  areas. 

Table  1  demonstrates  that  rankings  can  be  developed  for  any 
of  the  three  sets  of  areas  illustrating  several  important  features 
of  the  new  structure.  First,  while  areas  will  be  classified  into  a 
single  category  (PMSA  or  MSA),  the  significance  of  that 
classification  will  vary  according  to  use.  For  example,  Canton, 
Ohio,  is  shown  as  an  MSA,  ranking  number  62  among  all  MSA's. 
(Please  recall  that  these  are  illustrative  numbers  based  upon 
1978  population  estimates  and  1970  SMSA  boundaries.)  If 
viewed  from  the  perspective  of  overall  freestanding  urban  areas 


(CMSA's  and  MSA's),  the  Canton  area  is  ranked  number  77. 
This  type  of  comparison  is  not  likely  to  have  wide-spread  use, 
but  it  can  be  interesting  as  a  ranking  of  the  urban  areas  to  be 
serviced  by  regional  facilities  such  as  airports.  Finally,  the 
PMSA/MSA  ranking  (94th  for  Canton,  Ohio)  is  likely  to  be  used 
by  many  people  who  are  ranking  markets.  The  major 
components  of  consolidated  areas  are  better  for  market  analysis 
in  many  purposes  since  they  reflect  units  where  internal 
transportation  and  communications  are  logical  subparts  of  the 
megapolitan  areas.  Hence,  ranking  the  components  with  other 
areas  that  are  also  freestanding  will  be  logical  for  some  market 
analysis  purposes. 

There  is  some  controversy  about  the  use  of  consolidated  areas 
as  units  of  market  analysis.  It  is  my  view  that  the  extension  of 
the  consolidated  concept,  along  with  the  overall  new  structure 
discussed  above,  will  bring  more  focus  to  the  consolidated  area 
concept  than  occurred  earlier.  Since  the  consolidated  areas  were 
only  introduced  in  1975  and  have  been  given  less  visability  in 
Federal  statistics  than  they  will  be  given  in  the  new  system,  it  is 
reasonable  to  expect  that  the  consolidated  area  rankings  will 
take  on  greater  significance  in  the  1980's,  especially  for  those 
interested  in  the  Nation's  largest  urban  centers. 

Once  the  1980  counts  become  available,  we  can  determine  to 
what  extent  the  intercensal  estimates  and  growth  rates  were  in 


Table  1.  Population  and  Rank  of  Metropolitan  Statistical  Areas  (Sample  Table) 

(Data  for  potential  SMSA's  and  PMSA's  are  illustrative  and  are  based  on  existing  SCSA's  and  the  SMSA's  comprising  them.) 


State 


Category 


Rank 


CMSA 


PMSA 


MSA 


CMSA/MSA 


PMSA/MSA 


Level 


1978 

population 
estimate 


Ohio 

Akron,  OH     

Canton,  OH 

Cincinnati,  OH-KYIN 

Cincinnati,  OH-KY-IN 

Cleveland,  OH     

Cleveland,  OH    

Columbus,  OH    

Dayton,  OH 

Hamilton-Middletown,  OH    .  .  .  . 
Huntington-Ashland,  WV-KY-OH 

Lima,  OH     

Lorain-Elyria,  OH 

Mansfield,  OH     

Parkersburg-Marietta,  WV-OH    .. 

Springfield,  OH     

SteubenvilleWeirton,  OH-WV    .  . 

Toledo,  OH  Ml 

Wheeling,  WV-OH 

Youngstown-Warren,  OH    


PMSA 

MSA 

CMSA 

PMSA 

CMSA 

PMSA 

MSA 

MSA 

PMSA 

MSA 

MSA 

PMSA 

MSA 

MSA 

MSA 

MSA 

MSA 

MSA 

MSA 


20 
9 


21 

16 
11 

33 

32 


62 


15 
24 

86 
122 

176 
156 
135 
149 

31 
136 

45 


77 
20 


30 
39 

101 
137 

191 
171 
150 
164 

46 
151 

60 


57 
94 

26 

17 

35 

45 

144 

120 

159 

134 

215 

195 

173 

188 

53 

174 

72 


B 

657,000 

B 

403,700 

A 

1,645,500 

A 

1,389,100 

A 

2,867  100 

A 

1,938,900 

A 

1,088,900 

B 

834,200 

B 

256,400 

B 

299,900 

B 

211,600 

B 

271,200 

C 

130,300 

C 

157,700 

C 

183,200 

C 

162,000 

B 

775,800 

C 

181,100 

B 

545,800 
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error  as  indicated  in  sample  table  2.  We  shall  also  see  how 
accurate  our  predictions  were  of  counties  that  would  possibly 
be  affected  by  the  new  standards.  Further,  once  new 
metropolitan  statistical  areas  are  identified,  we  can  determine 
their  characteristics  based  on  the  1980  data. 

CONCLUSION 

As   indicated   earlier,  the  new  metropolitan  statistical   area 
classification    has    great    flexibility    for    the    Federal    program 


manager  and  for  the  marketing  analyst.  First,  it  requires  that  the 
user  be  much  more  specific  about  his/her  objectives.  Second,  it 
provides  a  greater  range  of  alternatives  for  defining  arbitrary 
cutoff  points.  Third,  the  new  system  makes  it  difficult  to  use  a 
simplistic  approach  for  defining  market  qualifications  or 
program  eligibility. 

The  Federal  Committee  on  Standard  Metropolitan  Statistical 
Areas  anticipates  that  there  will  be  less  "blind"  utilization  of 
the  "MSA"  as  a  market  or  program  qualification  than  was  true 
in  the  past  decade. 


Table  2.   Preliminary  Counts  from  the  1980  Census  (by  State)  (Sample  Table) 


State 


1970  Pop. 


1978  Pop. 
Estimate 


1978/1970 
Average  Growth 


1980 
Preliminary 


1980/1970 

Average  Growth 

Projected 


Ohio 

Akron,  OH     

Canton,  OH 

Cincinnati,  OH-KY-IN    

Cleveland,  OH    

Columbus,  OH 

Dayton,  OH 

Hamilton-Middletown,  OH 

Huntington-Ashland,  WV-KY-OH 

Lima,  OH     

Lorain-Elyria,  OH 

Mansfield,  OH    

Parkersburg-Marietta,  WV-OH    .., 

Springfield,  OH     

Steubenville-Weirton,  OH-WV    ... 

Toledo,  OHM! 

Wheeling,  WV-OH 

Youngstown— Warren,  OH     


679,239 

657,000 

-3.3 

393,789 

403,700 

2.5 

1,387,207 

1,389,100 

0.1 

2,063,729 

1,938,900 

-6.0 

1,017,847 

1,088,900 

7.0 

852,531 

834,200 

-2.2 

226,207 

256,400 

13.3 

286,935 

299,900 

4.5 

210,074 

211,600 

0.7 

256,843 

271,200 

5.6 

129,997 

130,300 

0.2 

148,132 

157,700 

6.5 

187,606 

183,200 

-2.4 

166,385 

162,000 

-2.6 

762,658 

775,800 

1.7 

181,954 

181,100 

-0.5 

537,124 

545,800 

1.6 

Uses  of  Metropolitan  Areas  in  the  Private  Sector  and  the 

Impact  of  the  New  Standards 

Edward  J.  Spar 
Marketing  Statistics,  Inc. 


In  1950,  the  Bureau  of  the  Budget  (later  renamed  the  Office 
of  Management  and  Budget)  first  defined  what  it  called  standard 
metropolitan  areas.  These  new  areas  were  to  be  used  by  all 
Federal  statistical  agencies  for  the  purpose  of  presenting  Federal 
statistics.  At  that  time,  the  New  York-North  East-New  Jersey 
SMA  was  defined  as  a  17-county  area  which  had  a  population  of 
about  13  million  people.  This  represented  approximately  8.6 
percent  of  the  total  population  of  the  United  States.  If  one 
were  to  rank  the  SMA's  of  1950,  the  New  York  area  would 
have  come  in  a  comfortable  first.  By  1960,  the  area  was  reduced 
to  the  nine  New  York  State  counties  with  a  population  of  10.6 
million  people,  representing  6.0  percent  of  the  U.S.  population. 
From  1960  to  1970,  this  area  in  terms  of  geography  and  ranking 
had  not  changed.  Under  the  new  criteria,  the  New  York  metro- 
politan area,  now  to  be  labelled  a  primary  metropolitan  sta- 
tistical area,  will  probably  represent  about  3.0  percent  of  the 
total  population  of  the  United  States.  And  for  the  first  time. 
New  York  will  most  likely  rank  second  as  a  metropolitan 
area.  How  could  this  have  happened?  And  what  effect  will  this 
have  upon  the  business  community?  In  order  to  properly 
answer  these  questions,  let  us  first  review  the  uses  of  metropoli- 
tan areas  in  the  private  sector. 

The  major  use  of  metropolitan  areas  in  the  private  sector  is 
to  define  marketing  areas.  These  areas  are  usually  sales  terri- 
tories, broadcast  media  advertising  areas,  or  newspaper  coverage 
areas.  These  areas  are  analyzed  as  separate  entities  and  also  in 
comparison  to  other  areas  through  the  use  of  ranking  tables. 
There  are  at  present  676  metropolitan  counties  and  county- 
equivalents  which  represent  about  164.5  million  people.  This  is 
about  74.8  percent  of  the  entire  population  of  the  United  States, 
though  only  21.5  percent  of  the  counties.  Use  of  these  areas 
enables  the  marketer  to  reach  the  vase  majority  of  the  popu- 
lation within  a  relatively  small  amount  of  land  area.  At  the 
same  time.  Federal,  local,  and  private  statistical  organizations 
produce  vast  amounts  of  detailed  data  for  these  areas,  allowing 
for  in-depth  demographic  and  socioeconomic  analyses.  These 
data  include  statistics  on  population,  income,  retail  trade,  and 
employment  which  are  crucial  in  determining  how  to  utilize 
and  evaluate  a  market.  Therefore,  it  makes  a  lot  of  sense  for  the 
marketer  to  utilize  metropolitan  areas. 


Let  us  first  discuss  sales  territories.  Since  metropolitan  areas 
are  dynamic— in  other  words,  they  continually  change,  thereby 
reflecting  the  migration  patterns  of  the  population— these  areas 
nicely  fit  the  needs  of  the  marketer.  It  is  to  such  areas  that 
one  wants  to  send  sales  people,  since  such  areas  are  so  efficient. 
Naturally,  the  sales  executive  will  analyze  data  pertinent  to  his 
or  her  product.  Such  an  analysis  might  shed  light  on  the  impor- 
tance of  one  metropolitan  area  relative  to  another.  For  example, 
the  sales  of  diapers  are  not  a  function  of  simply  the  total  popu- 
lation; a  younger  market  like  Houston  would  most  likely  receive 
a  higher  quota  than  the  aging  Nassau-Suffolk  area  even  though 
the  overall  population  of  each  area  is  the  same.  And,  infor- 
mation such  as  the  updated  age  distribution  is  readily  available 
for  metropolitan  areas.  One  of  the  best  ways  to  allocate  quotas 
is  to  first  develop  a  percentage  distribution  on  a  metropolitan- 
area  basis  consisting  of  a  weighted  average  of  the  specific 
population,  income,  and  retail  trade  data.  The  weights  and 
which  data  to  use  are  a  function  of  the  product  to  be  sold. 
In  the  case  of  lawn  mowers,  for  example,  the  population  com- 
ponent would  be  owner-occupied  single-family  homes.  The 
overall  quota  would  then  be  allocated  on  the  basis  of  this 
weighted  distribution.  Such  a  distribution  has  a  double  ad- 
vantage. Since  it  is  objective  and  representative  of  the  metro- 
politan area,  it  can  also  be  used  as  a  tool  to  evaluate  the  final 
sales  performance.  By  comparing  the  sales  distribution  on  a 
metropolitan-area  basis  to  the  quota  distribution,  an  evaluation 
of  performance  can  be  made.  Metropolitan  areas  are  far  superior 
as  units  of  analysis  to  more  local  areas  since  they  are  not  subject 
to  such  problems  as  the  local  department  store  being  owned  by 
the  brother-in-law  of  one's  competitor.  Such  problems  tend  to 
average  out  at  the  metropolitan-area  level. 

In  order  to  aid  the  sales  staff,  advertising  takes  place  at  the 
local  level.  This  advertising  may  be  in  the  form  of  radio,  televi- 
sion, or  print.  In  the  case  of  television  advertising,  the  concept 
of  the  television  market  is  utilized.  The  arbitron  Areas  of 
Dominant  Influence  and  the  Nielsen  Designated  Market  Areas 
are  the  most  common.  These  areas  are  developed  based  on  the 
signal  strength  of  the  central  stations.  However,  the  television 
market's  central  areas  correspond  baically  to  metropolitan 
areas.  In  the  case  of  radio  markets,  metro  rating  areas,  generally 
corresponding   to    SMSA's,    are   defined.    For   newspapers,   the 
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metropolitan  area  is  the  geographic  area  most  often  used  to 
measure  audience  reach.  I  mention  these  areas  to  point  out  the 
importance  of  the  metropolitan-area  concept  to  the  media 
industry.  Now  back  to  the  advertiser. 

The  basic  problem  to  be  resolved  is  how  shall  the  national 
advertiser,  usually  through  the  advertising  agency,  allocate  his 
or  her  dollars.  The  most  common  approach  is  through  the  use 
of  ranking  tables.  Some  agencies  will  allocate  the  advertising 
expenditures  to  the  top  100  markets.  Others  will  only  use  the 
top  50.  The  metropolitan  area  or  broadcast  area  is  the  standard 
area  to  use  in  those  rankings.  The  type  advertising  that  we  are 
speaking  about  is  for  local  newspapers,  radio,  and  sport  TV. 
The  desire  to  be  in  the  top  50  or  even  100  is,  as  you  can  imagine, 
fierce.  Companies  who  produce  population  estimates  are  under 
extreme  pressure  from  markets  that  rank  51  or  101.  The  data 
most  often  used  in  the  ranking  tables  will  be  total  population, 
households,  or  adult  population.  Once  the  amount  of  advertis- 
ing dollars  is  allocated  to  a  market,  the  next  step  is  to  determine 
how  to  spend  that  money  within  that  market.  Those  wishing  to 
use  radio  and/or  television  make  use  of  rating  reports  from 
such  companies  as  A.C.  Nielsen  or  Arbitron  to  determine 
which  stations  to  use.  The  rating  services,  by  the  way,  utilize 
metropolitan-area  data  in  order  to  stratify  their  samples.  The 
basic  sampling  frames  for  media,  market,  and  advertising  re- 
search firms  are  usually  stratified  by  metropolitan  counties  vs. 
nonmetropolitan  balances  of  States. 

Newspapers  may  be  the  largest  users  of  metropolitan-area 
data  in  the  media  field.  The  decision  of  whether  or  not  to 
advertise  in  a  newspaper  may  be  based  on  the  metropolitan 
coverage.  Newspaper  research  attempts,  as  in  the  case  of  broad- 
cast research,  to  find  out  the  demographics  of  the  reader.  Space 
in  newspapers  is  often  sold  by  showing  the  potential  advertiser 
the  composition  of  the  market.  However,  in  the  case  of  news- 
papers, the  use  of  the  metropolitan  areas  goes  well  beyong  the 
need  to  sell  space.  The  circulation  department  uses  the 
metropolitan-area  geography  as  a  means  to  determine  coverage. 
In  order  to  increase  circulation,  the  newspaper  analyzes  ZIP- 
code  and  tract  data  by  specific  demographics.  The  most  cost- 
effective  area  to  accomplish  this  in  is  the  metropolitan  area, 
considering  the  density  of  population  and  the  data  available. 

Up  to  this  point,  I  have  tried  to  give  you  an  idea  as  to  the 
degree  of  importance  that  metropolitan  areas  play  in  the  private 
sector.  It  is  safe  to  say  that  billions  of  sales  and  advertising 
dollars  are  allocated  based  upon  these  areas.  Let  us  now  turn  to 
the  new  criteria  and  see  what  their  impact  will  be  for  the 
business  community. 

I  would  like  to  review  three  major  changes  that  will  take 
place  and  have  the  most  effect  on  the  private  sector.  The  first 
change  eliminates  the  need  for  a  central  city  of  25,000  popula- 
tion in  order  for  the  area  to  be  metropolitan.  This  is  one  way  in 
which  the  new  criteria  make  a  concerted  effort  to  deal  with  the 
existence  of  suburban  and  exurban  sprawl.  The  deemphasis  of 
the  central  city  is  to  be  commended,  given  the  reality  that  the 
suburban  shopper  need  no  longer  travel  to  the  "big  city"  for 
his  or  her  shopping  needs.  The  mall  concept  along  with  the 
creation  of  vast  suburban  shopping  centers  has  been  a  central 


factor  in  this  decentralization  process.  And  as  energy  costs  such 
as  gasoline  prices  continue  to  increase,  the  desire  to  shop  near 
home  will  become  a  very  important  factor.  Therefore,  it  is 
unrealistic  to  require  that  areas  have  inner  cities  of  large  popu- 
lation when  the  socioeconomic  and  marketing  structure  of  these 
requires  a  new  topography. 

The  second  important  change  is  in  the  ailes  that  define  the 
central  core  area.  The  new  rules  adopt  the  Bureau  of  the  Census 
urbanized  area  as  the  basis  for  determining  the  central  counties. 
Those  counties  with  at  least  half  their  population  in  the  ur- 
banized area  will  now  qualify  for  the  central  core.  Therefore, 
there  will  be  more  central  counties.  Since  this  will  mean  more 
employment  centers  in  the  central  core,  some  new  outlying 
counties  will  be  admitted  to  large  metropolitan  areas  on  the 
basis  of  commuting  data. 

On  this  same  subject,  the  suburban  concept  is  further 
strengthened  by  the  criteria  for  outlying  counties.  The  criteria 
allow  counties  to  be  part  of  the  metropolitan  area  even  though  a 
small  percentage  of  commuting,  namely  15  percent  of  the 
employed  workers,  commute  to  the  central  counties.  However, 
a  degree  of  metropolitan  character  is  required  for  good  measure. 
In  marketing  terms,  these  counties  are  the  suburban  fringe  that 
should  be  included  in  any  sales  and  marketing  strategy.  The 
criteria  at  the  same  time  eliminate  the  low-density  rural  counties 
that  should  be  treated  separately.  As  you  can  see,  the  first 
two  major  changes  have  gone  a  long  way  in  restructuring  the 
basic  concept  and  reality  of  metropolitan  areas. 

The  third,  and  most  important  change,  attempts  to  create  a 
hierarchical  structure  of  metropolitan  areas.  This  completely 
new  approach  has,  in  my  opinion,  mixed  blessings.  Within 
metropolitan  areas  of  1  million  or  more  population,  any  county 
or  group  of  counties  that  constituted  a  metropolitan  area  on 
January  1,  1980,  will  be  recognized  as  Primary  Metropolitan 
Statistical  Areas,  unless  local  opinion  is  against  such  designation. 
Any  additional  county  or  group  of  counties  in  1  -million-or-more 
areas  will  also  receive  primary  recognition  if: 

1.  At  least  one  county  has  a  population  of  100,000  population, 

2.  Sixty  percent  of  the  counties'  population  is  urban  and, 

3.  Less  than  50  percent  of  its  resident  workers  commute  to  jobs 
outside  the  county. 

Let  us  look,  first,  at  the  positive  side  of  such  a  radical  and 
highly  original  approach  to  the  creation  of  metropolitan  areas. 
These  criteria,  which  totally  ignore  any  reference  to  cities,  put 
the  final  touch  on  the  concept  of  suburbanization.  For  example, 
we  will,  assuming  that  local  opinion  goes  along,  now  have  the 
primary  metropolitan  area  of  Westchester-Putnam-Rockland. 
From  a  market-segmentation  point  of  view,  this  is  a  very  logical 
outcome,  and  marketers  who  have  found  it  necessary  to  split  up 
the  unusable  and  overly  large  New  York  area  will  now  have  an 
official  area  designated  for  them.  Since  Nassau-Suffolk  will 
simply  change  its  status  only  to  the  extent  of  substituting  the 
word  "primary"  in  its  title,  that  segmented  market  will  stay 
unchanged.   New  York  City,  by  itself,  will  now  be  a  Primary 
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Metropolitan  Statistical  Area.  And  now  we  see  why  New  York 
will  rank  second  behind  Los  Angeles— it  has  been  stripped  of 
its  suburban  counties.  Although,  as  I'm  sure,  these  criteria 
were  not  created  to  help  the  marketing  executive  better  target 
his  or  her  advertising  and  sales  dollars,  it  will  have  a  positive 
effect  on  the  private  sector.  I  would  raise  one  question  here, 
however,  to  the  Office  of  Federal  Statistical  Policy  and  Stand- 
ards who  are  responsible  for  these  criteria.  Why  do  the  criteria 
stop  at  1  million  or  more  as  the  minimum  population  needed 
in  an  area  in  order  to  have  primary  areas?  Why  not  allow  this  to 
take  place  at  all  levels?  Has  OFSPS  determined  that  no  such 
areas  would  meet  the  criteria  under  the  1 -million  level? 

As  previously  mentioned,  ranking  tables  are  used  in  order 
to  allocate  various  types  of  advertising  expenditures.  And  every- 
one might  not  benefit  due  to  the  fact  that  the  Westchester  area 
may  be  a  separate  entity.  For  example,  if  you  are  a  local  news- 
paper in  a  county  that  is  part  of  a  large  metropolitan  area,  there 
is  a  high  probability  that  you  will  be  included  in  the  media 
allocation  schedule.  In  the  case  of  Westchester,  the  Yonkers 
Herald-Statesman  with  a  daily  circulation  of  39,000  readers 
would  most  probably  be  included  since  the  paper  is  part  of  the 
number-1 -ranked  market.  The  new  area,  with  a  population  of 
about  1 ,200,000,  will  rank  below  the  top  25  metropolitan  areas. 
That  number  25  is  one  of  the  major  cut-off  points  for 
distributing  dollars.  Markets  below  that  rank  are  many  times 


simply  out  of  luck.  This  leads  to  my  major  criticism  of  the 
primary-area  concept. 

The  marketing  community  will  not  differentiate  between 
primary  and  nonprimary  metropolitan  areas.  Therefore,  both 
types  of  areas  will  show  up  in  one  ranking  table.  And  since 
these  areas  are  formed  from  two  sets  of  criteria,  I  predict  that 
there  will  be  some  hot  debates  as  to  who  belongs,  qualitatively, 
in  the  rankings.  Using  two  sets  of  standards  for  what  will  be 
perceived  as  one  type  of  area  is  a  step  backward  in  the  attempt 
to  standardize  the  rules.  The  problem  cannot  be  solved  by  using 
two  ranking  tables,  one  for  regular  mets,  and  one  for  primaries. 
This  would  be  too  complex  and  probably  self-defeating  when 
trying  to  make  logical  comparison  of  areas.  Consolidated  areas, 
which  will  only  exist  where  primary  areas  have  been  defined, 
are  much  too  big  to  use  as  marketing  areas.  Therefore,  they 
cannot  be  compared  to  anything.  It  will  be  necessary  to  educate 
the  user  as  to  the  difference  between  primary  and  nonprimary 
areas.  I  hope  that  once  the  marketers  see  that  the  benefits 
outweight  the  disadvantages,  they  will  accept  both  types  of 
areas  in  one  ranking  table. 

In  conclusion,  for  the  criteria  as  a  whole,  the  benefits  greatly 
outweigh  the  problems.  I  believe  that  for  the  private  sector, 
the  new  metropolitan  areas  will  be  more  usable  and  after  an 
initial  period  of  confusion  will  be  accepted  by  the  marketing, 
sales,  and  advertising  communities. 
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ABSTRACT 

This  paper  describes  research  conducted  to  date  by  the 
Bureau  of  the  Census  in  investigating  the  feasibility  of  applying 
nonsurvey  statistical  methods  to  the  estimation  and  projecting 
of  socioeconomic  data  items  for  the  city  of  Denver,  Colorado. 
The  methods  applied  fall  into  three  general  categories:  synthetic 
estimation,  categorical  data  analysis,  and  time  series  analysis. 
The  data  used  as  input  to  these  methods  have  been  obtained 
from  Federal,  State,  and  local  sources. 

The  research  is  being  conducted  as  part  of  the  Commerce/ 
Cities  program,  a  Department  of  Commerce  (DOC)  program 
with  the  objective  of  demonstrating  how  cities  can  better  use 
existing  DOC  resources  to  deal  with  specific  problems. 

INTRODUCTION 

Statement  of  the  Problem 

In  November  of  1977,  the  Secretary  of  Commerce,  Juanita  M. 
Kreps,  approved  a  program  designed  to  demonstrate  how 
existing  Department  of  Commerce  (DOC)  resources  could  be 
better  used  by  cities  to  deal  with  specific  city  needs.  This 
program,  called  the  Commerce/Cities  Program,  was  an  out- 
growth of  discussions  with  representatives  of  the  U.S. 
Conference  of  Mayors  and  the  National  League  of  Cities  and  the 
work  of  an  informal  DOC  task  force. 

In  order  to  meet  the  objective,  an  interagency  steering 
committee  was  appointed  and  charged  with  determining  the 
needs  of  cities  that  could  be  addressed  using  existing  DOC 
resources.  Represented  on  this  committee  were  such  DOC 
agencies  as  the  Industry  and  Trade  Administration,  the  Econom- 
ic Development  Administration,  the  Office  of  Minority  Business 
Enterprises,  the  National  Bureau  of  Standards,  and  the  Bureau 
of  the  Census. 

After  meeting  with  officials  of  cities  selected  for  participa- 
tion in  the  Commerce/Cities  Program,  a  number  of  needs  were 
identified.  A  recurring  stated  need  of  the  cities  was  for  better 
small-area  data.  Duncan (1978),  Purcell  (1979),  and  Purcell  and 
Kish  (1979),  along  with  many  others,  reinforce  this 
requirement. 


Due  to  its  experience  in  data  collection  and  statistical 
methodologies,  the  Census  Bureau  is  undertaking  a  research 
project  in  an  effort  to  help  cities  in  their  quest  for  small-area 
data.  The  city  of  Denver,  Colorado  was  selected  as  the  test  site 
for  this  undertaking.  In  this  research  project,  the  Census  Bureau 
is  providing  assistance  to  Denver  by  investigating  the  feasibility 
of  using  statistical/quantitative,  nonsurvey  methods  for  esti- 
mating and  projecting  selected  data  items. 

The  general  approach  in  this  project  is  as  follows.  First,  the 
Census  Bureau  and  the  city  have  selected  a  small  set  of  variables 
for  which  estimation  and  projection  methods  will  be  applied.  At 
the  same  time,  methods  of  estimation  and  projection  are  being 
identified  and  their  potential  for  use  are  being  evaluated.  The 
Census  Bureau  and  the  city  are  developing  a  data  base  that  is 
being  used  in  attempting  to  apply  estimation  and  projection 
methods.  The  Bureau  is  acquiring,  adapting,  and/or  developing 
software  necessary  for  use  in  applying  estimation  and  projection 
methods  and  is  then  applying  the  methods  using  the  data  base. 
The  city  and  the  Bureau  are  beginning  to  evaluate  and  compare 
the  various  methods  to  determine  whether  any  of  them  can  be 
used  or  modified  for  use  by  the  city,  taking  into  consideration 
cost,  quality  of  results,  and  other  criteria.  Any  methods  that  are 
found  usable  will  be  transferred  to  the  city,  and  city  personnel 
will  be  trained  in  their  use. 

Summary  and  Organization  of  Paper 

This  paper  describes  progress  made  by  the  Census  Bureau  to 
date  in  the  application  of  nonsurvey  techniques  for  estimating 
and  projecting'  certain  socioeconomic  data  items  in  Denver. 
These  data  items,  hereafter  referred  to  as  variables  of  interest, 
include  population  and  income.  A  full  listing  of  the  variables  of 
interest  is  given  in  the  section,  "The  Research  Problem  in 
Denver." 

All  methods  being  considered  for  use  in  the  project  are 
described  in  the  section  "Methods."  The  primary  difference 
between  the  estimation  methods  applied  in  this  project  and  the 
nonsurvey  methods  currently  used  by  the  Census  Bureau  is  that 
the    current    Bureau    methods    use    primarily    data    from   the 


'The  project  has  focused  so  far  on  estimation  rather  than  projection 
methods.  Therefore,  projection  methods  are  not  discussed  in  this  paper. 
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decennial  censuses  and  administrative  registers;  in  the  Denver 
project,  data  from  intercensal  sample  surveys  is  being  used  as 
well.  The  use  of  survey  data  has  advantages  and  disadvantages 
that  are  spelled  out  in  the  section  "Methods."  However,  the 
authors  feel  that  use  of  such  data,  together  with  locally 
generated  administrative  data,  may  be  necessary  if  the  project  is 
to  succeed  in  enabling  Denver,  and  possibly  other  cities,  to 
produce  estimates  that  are  sufficiently  accurate  for  use  in 
economic  development  planning. 

The  section  "Data"  describes  the  data  being  considered  for 
use  as  input  to  the  estimation  methods  and  gives  some 
preliminary  indications  of  whether  the  data  can  be  used.  The 
section  "Results"  summarizes  some  of  the  first  applications  of 
the  methods,  and  the  last  section  gives  conclusions  and  recom- 
mendations based  on  these  preliminary  applications. 

Some  Caveats 

This  section  is  to  state  very  clearly  what  the  project  is  and  is 
not  designed  to  accomplish,  and  it  points  out  some  general 
problems  that  may  be  encountered.  These  problems  are  dis- 
cussed to  further  reinforce  that  the  project  is  one  of  research, 
rather  than  production  of  data. 

The  primary  purpose  of  the  study  is  to  investigate  the 
applicability  of  various  statistical/quantitative,  nonsurvey  meth- 
odologies for  producing  estimates  and  projections,  not  to 
produce  estimates  and  projections.  There  is  no  guarantee  that 
the  effort  will  be  successful.  Several  problems  could  prevent 
success. 

A  major  problem  may  be  the  lack  of  high-quality  data.  Many 
of  the  techniques  that  may  be  used  require  substantial  amounts 
of  data.  Even  if  there  are  enough  data,  the  data  may  be  too 
imprecise  for  use  in  this  project.  In  addition,  statistical/ 
quantitative  modeling  techniques  are  generally  used  under 
various  technical  assumptions.  If  these  assumptions  are  not 
satisfied,  the  techniques  may  not  give  valid  results.  In  fact,  the 
modeling  techniques  may  not  yield  satisfactory  estimates  and 
forecasts  even  if  a  sufficient  amount  of  high-quality  data  are 
available  and  the  technical  assumptions  are  satisfied.  For 
example,  the  variance  of  the  estimates  may  be  too  large  to  be 
useful. 

The  city  of  Denver  is  helping  the  Census  Bureau  to 
understand  the  city's  economic  and  social  environment  so  that 
use  of  the  modeling  techniques  may  be  improved.  For  example, 
the  opening  and  closing  of  industries  may  affect  data  items  in 
some  sections  of  the  city.  In  addition,  the  city  and  the  Census 
Bureau  are  working  together  throughout  the  project  to  insure 
successful  transfer  to  the  city  of  those  methodologies  that  may 
be  found  usable.  This  cooperation  is  essential  to  the  success  of 
the  project. 

THE  RESEARCH  PROBLEM  IN  DENVER 
Denver's  Needs 

The  city  of  Denver  wants  to  develop  a  data  base  to  be  used  in 
its  economic  development  planning.  This  data  base  is  to  consist 
of  socioeconomic  data  items  for  the  city  level  and,  where 
possible,  smaller  geographic  areas.   In  addition,  the  city  would 


like  to  obtain  short-term  projections  for  at  least  some  of  the 
items  in  the  data  base. 

The  data  items  of  interest  are  as  follows: 

1.  Total  population 

2.  Population  characteristics 

a.  Age 

(1 )  Preschool  (0  to  4  years) 

(2)  School  age  (5  to  18  years) 

(3)  Young  adult  (19  to  44  years) 

(4)  Middle  age  (45  to  59  years) 

(5)  Senior  citizens  (60-t-  years) 

b.  Race/ethnicity 

(1)  Anglo 

(2)  Black 

(3)  Chicano 

c.  Sex 

d.  Income 

(1 )  Proportion  of  population  earning  income  below  the 

poverty  level 

(2)  Per  capita  income 

The  goal  is  to  enable  the  city  to  apply  these  methods  to  produce 
its  own  estimates  at  the  city  level  and  below. 

Data  Currently  Available  for  Denver 

Currently,  some  of  the  above  items  are  available  for  Denver. 
Aside  from  the  data  available  from  the  decennial  censuses, 
which  cover  all  the  variables  of  interest  in  this  project,  the 
Census  Bureau  publishes  intercensal  estimates  of  population  and 
income  for  Denver  and  other  cities.^  However,  the  intercensal 
estimates  do  not  include  all  the  population  characteristics  the 
city  is  interested  in.  In  addition,  subcity-level  data  that  would 
be  useful  for  economic  development  planning  are  not  available 
between  censuses.  Therefore,  the  available  data  are  not  timely. 

The  city  of  Denver  has  attempted  to  produce  the  desired 
data,  but  city  officials  are  not  satisfied  with  the  results.^  They 
would  like  to  take  advantage  of  the  knowledge  of  estimation 
methods  and  data  resources  available  to  the  Census  Bureau.  This 
project  is  designed  to  allow  Denver,  and,  it  is  hoped,  other  cities 
to  do  this. 

METHODS 

This  section  contains  a  description  of  the  estimation  methods 
that  are  being  used  and  considered  for  use  in  this  project.  The 
section  "Data"  describes  the  data  that  will  be  considered  for  use 
as  input  to  these  methods.  Stated  in  terms  of  data  and  methods. 


^Denver  is  a  city  and  a  county,  so  publications  containing  data  for 
both  cities  and  counties  are  relevant.  For  all  counties  and  standard 
metropolitan  statistical  areas  (SMSA's)  annual  population  estimates  for 
the  years  1971-1978  can  be  obtained  from  the  following  Census  Bureau 
reports:  U.S.  Bureau  of  the  Census,  Current  Population  Reports,  Series 
P-25,  Nos.  505,  517,  527,  530-532,  535,  537,  618,  620,  709,  739,  810, 
and  873;  and  U.S.  Bureau  of  the  Census,  Current  Population  Reports, 
Series  P-26,  Nos.  49-93.  For  cities  below  the  county  level  and  minor  civil 
divisions,  the  Bureau  publishes  population  and  per  capita  income 
estimates.  For  example,  see  U.S.  Bureau  of  the  Census,  Current 
Population  Reports,  Series  P-25,  Nos.  740-789. 

'City  officials  in  Denver  have  expressed  this  view  to  the  authors. 
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one  goal  of  the  project  is  to  determine  whether  the  data  exist  to 
enable  the  use  of  the  methods  that  will  be  tried.  This  section 
relies  heavily  on  Purcell  and  Kish  (1979),  Purcell  (1979),  and 
their  references. 

Purcell  and  Kish  (1979)  have  classified  the  various  small- 
domain  estimation  methods  according  to  the  data  they  use. 
Their  classification  is  reproduced  in  table  1.  (See  Purcell  and 
Kish,  1979,  p.  368.) 

Table  1.  Existing  Small-Domain  Estimation  Methods 
Classified,  by  Sources  of  Data 


Method 

Census 

Register 

Sample 

Symptomatic  accounting 

Regression-symptomatic    

Synthetic  (ratio)    

* 

* 
* 
* 
» 

* 

* 
* 
» 

* 

Sample-regression 

» 

Synthetic  regression 

* 

Base  unit    

« 

Categorical  data  analysis  methods 

» 

Note:   An  asterisk   (*)   indicates  that  the  designated  method  uses  the 
indicated  type  of  data. 


There  are  other  approaches,  many  of  which  combine  one  or 
more  methods  to  obtain  a  composite  estimate.  A  brief  discus- 
sion of  all  the  methods  follows. 


Symptomatic  Accounting  Techniques  (SAT) 

These  techniques  are  the  oldest  of  the  small-domain  esti- 
mation methods.  They  use  logical  demographic  relationships  in 
combination  with  relationships  based  on  previous  data.  Basic 
demographic  accounting  identities  relate  births,  deaths,  and 
migration  to  the  change  in  population.  In  addition,  other 
equations  are  used  to  relate  population  growth  to  growth  in 
symptomatic  variables  such  as  numbers  of  births  and  deaths,  of 
dwellings,  of  school  enrollments,  and  of  tax  returns.  Generally, 
these  latter  relationships  are  developed  and  validated  using  data 
from  population  censuses. 

The  Census  Bureau's  Component  Methods  I  and  II.  The  primary 
objective  of  these  two  methods  (see  U.S.  Bureau  of  the  Census, 
1949  and  1966)  is  to  enable  estimation  of  the  net  civilian 
migration  component  of  population  change.  In  both  methods, 
estimation  of  migration  is  based  on  the  assumption  that  the 
migration  rate  of  the  total  population  is  based  on  the  migration 
rate  of  school  children.  The  methods  differ  in  the  way  that  the 
migration  rate  of  school  children  is  estimated.  In  addition,  the 
methods  take  direct  account  of  natural  increases  in  population 
and  net  loss  to  the  military. 

The  Vital  Rates  Technique.  This  approach  uses  large-area  birth 
and  death  rates  and  a  censal  ratio  procedure  (discussed 
immediately  below)  to  estimate  the  required  local-area  popu- 
lation sizes.  The  basic  assumption  underlying  this  technique  is 


that  the  local-area  birth  and  death  rates  have  changed  in  the 
same  proportions  as  the  rates  for  the  large  areas. 

Censal  ratio  methods  consist  of  (1)  computing  the  ratio  of 
each  symptomatic  data  item  to  the  total  population  at  the 
census  date,  (2)  assuming  the  ratio  continues  to  hold  at  the 
(intercensal)  estimate  date,  and  (3)  dividing  this  ratio  into  the 
value  of  the  symptomatic  series  for  the  estimate  date  to  obtain 
the  intercensal  population  estimate.  (U.S.  Bureau  of  the  Census, 
1973,  p.  753.) 

The  Composite  Method.  This  method  was  devised  as  an 
alternative  to  the  vital  rates  technique.  The  local-area  popu- 
lation is  divided  into  distinct  age  groups;  then,  population 
estimates  are  obtained  separately  for  each  group,  using  the 
techniques  and  data  considered  most  appropriate  for  each 
group.  The  resulting  subgroup  estimates  are  summed  to  obtain  a 
local-area  population  estimate.  (U.S.  Department  of  Health,  Edu- 
cation, and  Welfare,  1959,  p.  161.) 

The  Housing  Unit  Method.  In  this  method,  current  estimates  of 
the  number  of  housing  units  in  the  local  areas  and  of  the  average 
number  of  individuals  per  housing  unit  are  used,  based  on  the 
assumption  that  changes  in  the  number  of  housing  units  reflect 
changes  in  population.  This  method  and  a  modification  are 
discussed  by  the  U.S.  Bureau  of  the  Census  (1969),  and  a 
further  modification  is  given  by  Rives  (1976). 

The  Administrative  Records  Method.  This  procedure,  described 
by  Starsinic  (1974),  is  similar  to  the  census  component 
methods,  except  that  the  estimate  of  net  migration  is  based  on 
the  number  of  incomes  filed  with  the  Internal  Revenue  Service. 
The  use  of  individual  records  permits  estimates  for  very  small 
areas,  but  confidentiality  provisions  limit  use  of  the  method  to 
the  Federal  Government. 

In  general,  the  Census  Bureau's  Population  Division  (POP) 
prepares  population  estimates  for  counties  by  averaging  esti- 
mates from  several  of  these  methods  and  the  ratio-correlation 
method,  which  is  discussed  in  the  next  section.  The  combina- 
tion of  methods  used  sometimes  varies  by  State  or  by  year.  For 
a  summary  of  the  methods  used  currently  to  prepare  county 
estimates  for  each  State,  see  U.S.  Bureau  of  the  Census  (1980), 
pp.  2-3. 

Regression-Symptomatic  Procedures 

These  methods  are  based  on  the  fitting  of  a  functional 
relationship,  using  least-squares  regression,  between  the  variable 
of  interest  and  the  symptomatic  variables.  Typical  symptomatic 
variables  include  births,  deaths,  elementary  school  enrollment, 
tax  returns  filed,  motor  vehicle  registration,  employment,  voter 
registration  or  votes  cast,  bank  deposits,  and  sales  taxes.  (U.S. 
Bureau  of  the  Census,  1 973,  p.  756.) 

The  Ratio-Correlation  Method  (Regression  Method).  In  this 
method,  a  regression  equation  is  estimated,  usually  employing 
data  from  the  two  previous  censuses.  The  independent  variables 
in  the  regression  are  the  changes  during  the  previous  intercensal 
period  in  a  local  area's  share  of  the  total  for  the  parent  area  in 
several  symptomatic  series  (e.g.,  the  change  in  a  county's  share 
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of  births  in  a  State  between  1960  and  1970).  The  dependent 
variable  is  the  corresponding  change  in  population.  In  the 
ratio-correlation  method,  the  changes  are  expressed  as  ratios 
(e.g.,  the  ratio  of  a  county's  share  of  births  in  1970  to  its  share 
in  1960). 

To  compute  postcensal  estimates,  symptomatic  data  from 
the  postcensal  period  are  used  to  determine  the  values  of  the 
independent  variables  in  the  postcensal  period.  Using  the 
regression  coefficients  as  determined  above,  postcensal  popu- 
lation estimates  are  then  computed  for  each  local  area. 

This  method  depends  on  the  assumption  that  the  observed 
statistical  relationship  between  the  independent  and  dependent 
variables  will  persist  in  the  current  postcensal  period.  The 
adequacy  of  the  assumption  depends  not  only  on  the  stability 
over  time  of  the  underlying  relationships,  but  also  on  the  size  of 
the  multiple  correlation  in  the  regression  and  on  the  number  of 
independent  variables  used.  (Purcell  and  Kish,  1979,  p.  373, 
U.S.  Bureau  of  the  Census,  1973,  p.  756.) 

The  Difference-Correlation  Method.  This  method  has  been 
presented  by  O'Hare  (1976).  It  is  almost  identical  to  the 
ratio-correlation  method;  the  distinction  lies  in  the  con- 
struction of  the  variables  that  are  used  to  measure  change  over 
time.  In  the  ratio-correlation  method,  these  variables  are  ratios 
(e.g.,  the  ratio  of  a  county's  share  of  total  State  births  in  1970 
to  its  share  in  1960).  In  the  difference-correlation  method,  these 
ratios  are  replaced  by  differences  (e.g.,  the  difference  between  a 
county's  share  of  total  State  births  in  1970  to  its  share  in  1960). 
O'Hare  has  claimed  that,  for  his  data,  the  relationships  he 
estimated  using  differences  of  proportions  showed  more  stabil- 
ity over  time  than  other  relationships  obtained  using  ratios  of 
proportions.  This  claim  has  received  support  from  others  (see 
Purcell  and  Kish,  1 979,  p.  373,  for  a  fuller  discussion). 

Synthetic  Estimation  Methods 

These  methods  are  applied  as  follows.  First,  sample  data  are 
used  to  estimate  the  variable  of  interest  (e.g.,  unemployment) 
for  different  subclassees  of  the  population  (e.g.,  sex)  at  a  higher 
level  of  geographic  aggregation  than  required  (e.g.,  for  a  State). 
Then,  these  estimates  are  scaled  by  the  proportional  incidences 
of  these  subclasses  in  each  small  area  (e.g.,  by  the  proportion  of 
males  in  each  county  in  the  State)  to  obtain  estimates  for  the 
small  area.  Synthetic  estimates  will  be  correct  if  the  composi- 
tion of  the  small  areas  (e.g.,  the  proportion  of  males  in  each 
county)  is  known  and  if  the  large-area  estimates  accurately 
reflect  the  small  areas.  A  more  detailed  discussion  of  synthetic 
estimation,  with  references,  is  given  by  Purcell  and  Kish  (1979, 
section  3). 

The  Sample-Regression  Method 

This  method  is  similar  to  the  ratio-correlation  method,  but 
relies  heavily  on  current  data  to  estimate  the  parameters  in  the 
model.  Like  the  ratio-correlation  method,  the  sample-regression 
method  is  based  on  a  regression  equation  using  symptomatic 
variables.  But  the  equation  is  estimated  using  sample  estimates 
of  growth  in  the  variable  of  interest  in  the  postcensal  period. 


rather  than  using  actual  measures  taken  from  the  previous  two 
censuses  as  in  the  ratio-correlation  method. 

The  postcensal  growth  in  the  symptomatic  variables  and  the 
estimate  of  the  postcensal  growth  in  the  variable  of  interest  are 
calculated  at  the  primary  sampling  unit  (PSU)  level.  To  obtain 
estimates  for  the  local  areas  (which  in  most  cases  will  not 
correspond  to  PSU's),  the  postcensal  values  of  the  symptomatic 
indicators  for  the  local  areas  are  substituted  into  the  estimated 
regression  equation. 

In  comparison  with  the  ratio-correlation  method,  in  which 
old  census  data  is  used  to  estimate  the  regression  parameters, 
the  sample-regression  method  avoids  the  problem  of  changes 
over  time  in  the  structural  relationships  (i.e.,  the  regression 
equation).  However,  the  price  paid  for  this  gain  is  the 
introduction  of  sampling  error  into  the  variable  of  interest. 
Hence,  the  relative  accuarcy  of  the  two  methods  depends  on  the 
balance  between  error  due  to  changing  structural  relationships 
and  error  due  to  sampling. 

The  approach  as  described  here  is  generally  attributed  to 
Ericksen  (1971 ).  The  method  has  been  applied  to  the  estimation 
of  county  and  State  populations  using  1970  census  data  on 
population  growth  (see  Ericksen,  1973,  1974). 

Synthetic- Regression  Procedures 

The  synthetic  approach  discussed  above  is  I i raited  because  it 
cannot  account  for  changes  over  time  in  the  distribution  of  the 
associated  variables  across  the  small  domains.  For  example,  the 
opening  of  a  nursing  home  may  drastically  affect  the  proportion 
of  the  elderly  in  its  area.  Two  methods  have  been  developed  to 
handle  this  problem.  Both  use  symptomatic  information  at  the 
local  level  in  conjunction  with  the  synthetic  estimate  to  form  an 
improved  estimate  that  reduces  the  bias  without  substantially 
increasing  the  variance. 

The  Regression  Adjusted  Synthetic  Method.  Essentially,  in  this 
method  the  deviation  of  the  synthetic  estimate  from  the  true 
value  is  expressed  as  a  regression  function  of  symptomatic 
variables  such  as  births,  deaths,  and  school  enrollments.  The 
main  problem  with  this  method  is  that  it  is  impossible  to 
estimate  the  regression  coefficients  directly  since  the  dependent 
variable  is  defined  using  the  true  value  of  the  variable  of 
interest,  which  is  unknown.  One  method  that  has  been  used  to 
circumvent  this  problem  consists  of  computing  the  regression  at 
a  higher  geographic  level  for  which  good  estimates  of  the 
variable  of  interest  are  available.  For  more  details,  see  Levy 
(1971). 

The  Combined  Synthetic-Regression  Method.  This  method  is 
basically  the  same  as  the  sample-regression  method  except  that 
the  synthetic  estimate  is  added  as  an  additional  independent 
variable.  It  has  been  used  by  Nicholls  (1977)  to  estimate 
local-area  populations  in  Australia  and  by  Gonzalez  and  Hoza 
(1978)  to  estimate  small-area  unemployment  in  the  United 
States.  Like  the  sample-regression  method,  successful  appli- 
cation of  the  combined  synthetic  regression  method  depends  on 
the  availability  of  estimates  for  a  sample  of  local  areas. 
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The  Base  Unit  Method 

In  looking  for  a  method  that  could  be  used  easily  for 
variables  other  than  population  size,  Kalsbeek  (1973)  proposed 
a  method  based  on  splitting  up  the  small  domains  into  smaller 
base  units  and  then,  by  using  the  symptomatic  variables  in  a 
clustering  algorithm,  classifying  each  base  unit  into  one  of  k 
groups  of  base  units  for  which  estimates  have  been  obtained 
using  a  survey  sample.  The  small-domain  estimates  are  obtained 
by  taking  weighted  combinations  of  the  grouped  base-unit 
estimates.  For  a  more  detailed  discussion  of  the  method,  see 
Purcell  and  Kish  (1979,  pp.  375-76)  and  Kalsbeek  (1973). 

The  main  advantage  of  this  method  over  the  sample- 
regression  method  is  that  no  special  functional  form  is  assumed 
between  the  variable  of  interest  and  the  symptomatic  variables. 
However,  application  of  the  method  is  restricted  because 
estimates  must  be  available  for  a  sample  of  base  units. 


Categorical  Data  Analysis  Techniques 

This  approach,  which  is  proposed  by  Purcell  (1979),  is  a 
means  of  estimating  small-domain  characteristics  that  can  be 
represented  as  cross-classified  frequency  data  (e.g.,  population 
or  percent  of  population  by  age,  race,  or  sex).  As  Purcell  states 
in  his  thesis  abstract  (1979,  p.  2)  — 

The  aim  is  to  obtain  estimated  frequencies,  within  each  of 
a  number  of  small  domains,  for  a  set  of  categories  of  the 
variable  of  interest.  At  some  previous  time  (usually  the 
census)  these  counts  are  assumed  available,  further  cross- 
classified  by  a  set  of  associated  variables,  and  the  resulting 
cross-tabulation  is  termed  the  'association  structure'.  For 
the  association  structure  updated  margins  are  also 
assumed  available  (usually  derived  from  current  sample 
survey  information),  and  are  jointly  termed  the  'allocation 
structure'.  The  current  small  domain  estimates  are  then 
obtained  through  the  use  of  an  iterative  proportional 
fitting  algorithm  to  force  the  original  cross-tabulation 
(association  structure)  to  agree  with  the  new  margins 
(allocation  structure). 

Purcell  calls  these  estimators  "Structure  Preserving  Estimators" 
(abbreviated  SPREE).  The  name  arises  because  the  estimators 
preserve  all  the  interactions  between  the  associated  variables  and 
the  variable  of  interest  except  the  interactions  that  are  explicitly 


redefined  by  the  allocation  structure.  Purcell  also  shows  that 
the  basic  synthetic  estimator  described  above  can  be  viewed  as  a 
special  case  of  the  SPREE  estimator. 

Purcell  (1979)  has  applied  this  approach  to  vital  statistics  and 
population  data  and  tested  it  empirically  on  State  estimators  for 
each  of  four  different  causes  of  mortality.  Previously,  Chambers 
and  Feeney  (1977)  have  applied  the  approach  to  obtain 
small-domain  estimates  of  work  force  status;  Bousfield  (1978) 
has  applied  it  to  obtain  intercensal  population  estimates  for 
age-by-race-by-sex  cells  for  the  Chicago  metropolitan  area. 

Composite  Estimators 

Much  of  the  most  recent  research  in  small-area  estimation  has 
centered  on  forming  composite  estimators  that  are  designed  to 
take  advantage  of  the  strengths  of  individual  estimators  while 
they  downplay  the  estimators'  weaknesses.  The  approach  is  not 
new;  it  has  been  used  in  the  symptomatic  accounting  techniques 
and  the  synthetic  regression  approaches.  Purcell's  and  Kish's 
discussion  of  four  general  methods  (1979,  pp.  378-79)  is 
summarized  here. 

The  Composite  Synthetic  Approach.  In  cases  where  there  are 
sample  estimates  from  the  small  domains,  it  may  be  useful  to 
combine  them  with  synthetic  estimates.  The  direct  sample 
estimate  is  generally  unbiased  but  has  a  high  variance;  the 
synthetic  estimate  is  biased  but  in  general  has  a  smaller  variance. 
A  suitably  weighted  linear  combination  of  the  two  estimates 
will  yield  a  composite  estimator  with  bias  and  variance  between 
those  of  the  original  estimates.  Schaible  (1978)  and  Schaible  et 
al.  (1977)  have  done  exploratory  work  in  this  area. 


^The  interactions  are  terms  in  the  log-linear  model  that  underlies  the 
SPREE  estimates.  The  model  is  for  a  three-way  table,  and  can  be  written 
as 

'"fijk=V+Vi+yj+yk+Vij+yik+Vjk+yijk 
where   each   variable   i,j,  or   k   has   several   categories,  and  In  f...    =  the 

IJK 

natural  logarithm  of  the  count  for  cell  ijk  (e.g.,  the  population  in 
Denver's  northwest  community  (i)  that  is  under  5  years  old  (j)  and  is 
black  and  female  (k).) 

y       =   an  overall  mean 

y.      =   the  effect  of  variable  i  (e.g.,  age)  on  the  cell  count,  with  similar 
definitions  for  y.  and  y,  . 

y;,     =   the  effect  on  f-,   of  the  interaction  between  variables  i  and  j 
'Ik  i|k 

(e.g.,  age  and  race-and-sex),  with  similar  definitions  for  the 

other  two-way  interactions. 

y...   =  the  effect  on  f..,   of  the  three-way  interaction  between  variables 
'ijk  ijk 

i,  j,  and  k  (e.g.,  between  community,  age,  and  race-and-sex). 

For  more  detailed  discussions  of  log-linear  models,  see  Bishop,  Feinberg, 
and  Hollard  (1975)  or  Feinberg  (1977). 
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Composite  Ratio-Correlation  and  Sample- Regression  Method. 
According  to  Purcell  and  Kish  (1979,  p.  378): 

The  ratio-correlation  and  sample-regression  methods  are 
not  so  much  different  methods  as  different  estimates; 
differences  between  the  two  arise  chiefly  from  different 
assumptions  regarding  available  data.  The  sample- 
regression  method  uses  current  sample  information  to 
estimate  the  regression  coefficients,  while  the  ratio- 
correlation  technique  uses  more  precise  but  out-of-date 
coefficients. 

Royall  (1974)  suggests  using  a  linear  combination  of  the  new 
and  old  regression  coefficients,  in  which  the  weights  on  the 
coefficients  are  determined  using  a  model  that  describes  the 
change  in  the  coefficients  over  time. 

James-Stein  and  Bayesian  Estimates.  Certain  prior  probability 
distributions  for  the  variable  of  interest  yield  posterior  estimates 
which  are  typically  a  linear  combination  of  simple  estimates 
over  small  domains  and  estimates  for  large  domains. 

These  estimators  can  be  used  in  situations  where  only 
samples  are  available  and  auxiliary  data  from  censuses  or 
registers  are  lacking.  But  the  method  can  also  be  used  with 
auxiliary  information.  The  Census  Bureau  has  applied  a  modi- 
fied James-Stein  estimator  to  sample  data  from  the  1970  census 
to  estimate  base  estimates  of  per  capita  income  for  the  general 
revenue  sharing  program  (Fay  and  Herriott,  1977;  Fay,  1979). 

For  more  details  on  these  approaches,  see  Efron  and  Morris 
(1973),  Purcell  and  Kish  (1979,  pp.  378-79),  and  the  Fay  and 
Herriott  work  referenced  above. 

DATA 

The  decennial  censuses  provide  the  most  detailed  and 
accurate  data  available  on  all  the  variables  of  interest  at  the 
proper  geographic  levels,  but  the  data  are  not  timely.  The  data 
described  in  this  section,  obtained  from  surveys  and  administra- 
tive registers  produced  of  the  Federal,  State,  and  local  levels,  are 
being  examined  for  possible  use  in  the  estimation  methods.  The 
survey  and  administrative  data  can  be  for  higher  geographic 
levels  than  the  levels  of  interest.  One  of  the  important  objectives 
of  the  project  is  to  determine  whether  the  sources  examined 
here  will  provide  sufficiently  accurate  and  timely  input  data. 

Faderal  Survey  Data 

Three  Federal  surveys  have  been  investigated  so  far:  the 
Annual  Housing  Survey  (AHS),  the  Survey  of  Income  and 
Education  (SI  E),  and  the  Annua!  Demographic  File  (ADF)  from 
the  March  Current  Population  Survey  (CPS). 

The  Annual  Housing  Survey.  The  AHS  is  a  survey  conducted  by 
the  Census  Bureau  for  the  Department  of  Housing  and  Urban 
Development. 

Its  purpose  is  to  assess  the  progress  made  toward  realizing  the 
goal  of  a  decent  home  and  a  suitable  living  environment  for 
every  American  family.  A  national  survey  is  conducted  every 


year,  and  surveys  are  conducted  every  three  or  four  years  for 
large  metropolitan  areas,  including  Denver.  For  Denver,  the 
most  recent  published  data  are  for  1976;  the  most  recent  survey 
was  conducted  in  1979-80.  The  data  from  that  survey  probably 
will  be  available  in  late  1981  or  early  1982.  The  survey  universe 
was  all  housing  units  in  the  Denver  SMSA  not  including 
institutions. 

Data  can  be  obtained  from  the  survey  microdata  tape  for 
most  of  the  variables  of  interest  in  this  project  at  the  SMSA 
level  or  for  the  central  city.  In  addition,  the  survey  produced 
data  on  various  characteristics  of  Denver's  housing  stock.  The 
AHS  has  a  relatively  large  sample  size  in  Denver  (approximately 
2,500  housing  units)  so  that  it  can  be  considered  a  primary  data 
source  in  this  project. 

For  more  detail  on  the  AHS  in  Denver,  see  U.S.  Bureau  of 
the  Census  (1978). 

Survey  of  Income  and  Education  (SIE).  This  survey  was 
conducted  to  estimate  the  number  of  school-age  children  living 
in  poverty  families  in  each  State  and  to  estimate  the  number  of 
persons  who  are  in  need  of  bilingual  education  because  of 
limited  ability  to  speak  English.  The  survey  was  conducted  only 
once,  in  1976.  The  survey  universe  included  all  households  and 
persons  in  group  quarters;  it  excluded  the  institutionalized 
population.  The  survey  was  conducted  so  that  estimates  of 
approximately  equal  accuracy  would  be  obtained  for  each  State. 
Therefore,  the  sample  sizes  vary  widely  across  States.  The 
sample  size  for  Colorado  is  relatively  large. 

The  survey  microdata  file  enables  the  construction  of 
estimates  for  the  Denver  SMSA  or  the  central  city.  Data  can  be 
obtained  for  most  of  the  variables  of  interest  and  for  some 
associated  variables,  such  as  household  characteristics.  However, 
for  Denver  itself,  the  sample  size  (about  1,725)  is  small  in 
comparison  to  the  AHS,  so  estimates  from  the  SIE  are  probably 
less  reliable  than  estimates  from  the  AHS.  However,  SIE  data 
may  be  useful,  perhaps  in  combination  with  data  from  other 
sources. 

For  more  detail  on  the  SIE,  see  U.S.  Bureau  of  the  Census 
(1979). 

Annual  Demographic  File  (ADF)  of  the  Current  Population 
Survey  (CPS).  The  primary  purpose  of  the  CPS  is  the  pro- 
duction of  monthly  national  statistics  on  unemployment  and 
the  labor  force,  on  the  population  as  a  whole,  and  for  various 
subgroups  of  the  population.  The  CPS  is  taken  monthly;  the 
ADF  is  compiled  using  March  CPS  data.  The  survey  universe 
includes  all  noninstitutionalized  persons  in  housing  units  and 
group  quarters.  For  the  Denver  SMSA,  the  sample  size  is  about 
600. 

For  more  detail  on  the  ADF,  see  U.S.  Bureau  of  the  Census 
(1974). 

The  ADF  can  be  used  to  obtain  estimates  of  most  of  the 
variables  of  interest  as  well  as  other  characteristics  of  the 
population  and  labor  force.  However,  for  SMSA's  these  esti- 
mates may  be  unacceptable  for  use  in  this  project.  The 
State-level  estimates  are  more  accurate,  but  their  usefulness  in 
this  project  can  only  be  determined  after  further  research. 
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State  and  Local  Survey  Data 

One  important  survey  has  been  identified.  The  Denver  Office 
of  Policy  Analysis  conducted  a  housing  survey  in  1978.  Data  on 
dennographics,  economic  status,  occupation,  and  employment 
status  were  collected  on  a  sample  of  households  in  the  central 
city  of  Denver  (see  City  and  County  of  Denver,  1979a  and 
1979b).  The  microdata  file,  which  contains  5,212  records,  is 
available  on  computer  tape.  This  may  be  the  best  source  of 
survey  data  on  Denver.  However,  it  is  a  one-time  survey  that 
probably  will  not  be  repeated.  Also,  it  may  not  be  possible  to 
compare  the  data  from  this  survey  to  data  from  other  surveys 
because  there  may  be  differences  in  definitions. 

Administrative  Registers  from  Federal  Sources 

One  important  data  source  is  the  vital  statistics  records  kept 
by  the  Department  of  Health,  Education,  and  Welfare,  including 
data  on  births  and  deaths  down  to  the  county  level. 

Knott  (1979)  has  presented  a  summary  of  other  major 
Federal  administrative  registers  that  are  of  statistical  interest, 
have  extensive  coverage  of  a  population  (either  individuals  or 
businesses),  and  are  maintained  by  computer.  This  discussion 
concentrates  on  the  files  that  cover  individuals  and  in  which  the 
individual  records  can  be  geographically  located  by  State  or  for 
smaller  areas. 

The  following  files  were  studied  partially  for  the  purpose  of 
preparing  documentation  on  their  current  and  potential  statis- 
tical uses: 

1.  Bureau  of  the  Census:  1960  and  1970  Censuses  of  Population 

2.  Office  of  Personnel  Management:  Central  Personnel  Data  File 

Civil  Service  Annuity  Roll 

3.  Department  of  Defense:  Active  Military  Personnel  Data  File 

Military  Retirement  Compensation 
File 

4.  Department  of  Transportation:  National  Driver  Register 

5.  Internal  Revenue  Service:  Individual  Master  File 

6.  Office  of  Education:  Basic  Education  Opportunity  Grant 

7.  Railroad  Retirement  Board:  Research  Master  Beneficiary 

File 

Service  and  Compensation 
(SCORE) 

Railroad  Retirement  Sur- 
vivor, and  Pension  Benefit 
Payment  File 

8.  Social  Security  Administration:  Summary  Earning  Records 

Master  Beneficiary  Record 
Numerical  Identification 
File  (SS-5) 

9.  U.S.  Coast  Guard:  Personnel  Management  Information 

System 

Retired  Officers  Support  System 
Retired  Pay  and  Personnel  System 


10.  Veterans  Administration:  Compensation  and  Pension 

Master  Record 
Insurance  (In-Force)  Master 

Record  File 
Education  Master  Record  File 
Vocational  Rehabilitation  and 

Education  Statistical  File 
Insurance  Awards  Master  File 
Education  Master  File 

All  of  these  files  are  described  in  more  detail  by  Knott  (1979), 
who  makes  general  comments  on  their  potential  for  use  in 
making  estimates.  Several  of  those  comments  are  pertinent  to 
this  project. 

In  terms  of  coverage,  the  census  files  are  the  most  complete, 
followed  by  the  Social  Security  Administration  Summary 
Earning  Files  and  the  Internal  Revenue  Service  Individual 
Master  File.  While  no  other  files  have  the  same  breadth  of 
coverage,  some  of  the  other  files  cover  comprehensively  a 
particular  segment  of  the  population.  The  Social  Security  Files 
cover  the  elderly  population.  The  Office  of  Personnel  Manage- 
ment Central  Personnel  File  covers  Federal  workers,  who  are 
not  covered  under  Social  Security.  The  military  personnel  data 
files  cover  another  group  that  is  important  in  making  accurate 
population  estimates. 

Unfortunately,  little  coded  geography  exists  on  these  files. 
Some  contain  a  State  code  usually  derived  from  a  mailing 
address.  This  lack  of  residence  geography  presents  a  major 
problem  in  using  these  files  in  this  project.  It  is  possible,  by 
using  mailing  addresses,  to  assign  subcounty  geography  using  a 
Geographic  Base  File,  but  several  problems  arise  if  this  task  is 
attempted.  For  details,  see  Knott  (1979,  p.  6).  In  addition,  the 
cost  of  assigning  geography  with  a  Geographic  Base  File  system 
is  high. 

Another  approach  to  assigning  geography  to  file  records  is 
the  addition  of  a  residence  geographic  code.  This  was  done  for 
the  1972  and  1975  Internal  Revenue  Service  Individual  Master 
Files  to  help  prepare  population  estimates  for  use  in  the 
Revenue  Sharing  Program  using  the  administrative  records 
method  discussed  above.  This  approach  is  quite  expensive. 

In  summary,  there  are  several  Federal  administrative  files 
that  are  being  investigated  for  possible  use  in  this  project. 
Preliminary  study  indicates  that  most  of  them  do  not  have  the 
proper  geographic  detail.  To  supply  this  detail  may  be  possible, 
but  it  would  certainly  be  expensive.  Investigations  will  be  made 
todecideconclusively  whether  the  expense  would  be  prohibitive. 

Administrative  Registers  from  State  and  Local  Sources 

The  following  sources  of  administrative  data  are  available 
from  Denver.  It  should  be  noted  here  that  checks  will  be  made 
on  the  quality  of  all  local  data  used  in  this  project. 

Several  local  files  contain  data  that  may  be  useful  in  the 
project.  The  Denver  Department  of  Health  and  Hospitals 
maintains  two  files  containing  yearly  data  on  births  and  deaths. 
Each  file  includes  somewhat  less  than  10,000  records.  These 
data  include  demographic  characteristics  such  as  race  and  sex  of 
newborn  babies;  and  age,  race,  and  sex  of  deceased  persons.  All 
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file  records  are  coded  by  census  tract  of  residence,  and  tine  file  is 
currently  being  automated.  These  data  may  be  useful  in  several 
of  the  estimation  methods. 

The  Denver  Department  of  Social  Services  maintains  a 
welfare  recipients  file  that  contains  records  on  individuals 
within  families  receiving  welfare.  For  certain  areas  of  the  city 
these  data  may  be  of  help  in  estimation.  The  file  contains  data 
on  the  type  of  assistance;  household  size;  and  on  age,  race,  and 
sex  of  recipients  and  their  family  members.  All  individual 
records  are  coded  by  census  tract.  The  file  contains  approxi- 
mately 50,000  records,  one  for  each  family  member.  Use  of  this 
file  would  require  the  removal  of  individual  identifiers  from 
records  to  maintain  confidentiality. 

The  Denver  Motor  Vehicles  Division  maintains  a  registered 
vehicles  file  containing  such  data  items  as  the  address  of  the 
vehicle  owner  and  the  price  and  age  of  the  vehicle.  The  file 
contains  about  500,000  records  and  has  no  confidentiality 
restrictions.  Records  are  at  the  address  level,  for  which  the 
coding  is  consistent  with  the  city's  address  coding  guide  and  the 
Census  Bureau's  GBF/DIME.  The  usefulness  of  the  data  would 
depend  on  the  rate  of  successful  address  matching. 

The  Denver  Election  Commission  maintains  a  registered 
voters  file  containing  demographic  data  on  individual  registered 
voters.  The  file  includes  over  230,000  records,  but  the  size  of 
the  file  depends  on  the  nature  of  the  upcoming  elections.  The 
records  are  coded  at  the  address,  precinct,  and  city  council 
district  level.  There  are  plans  to  code  the  data  by  census  tract. 

The  Denver  Public  Schools  maintain  an  annual  student  file 
containing  basic  demographic  data  on  students  in  the  public 
schools.  The  file  has  about  70,000  records. 

Finally,  the  Denver  Planning  Office  maintains  a  land-use  file 
containing  information  on  the  use  and  characteristics  of  the 
city's  housing  stock.  The  file  contains  153,000  records,  one  for 
each  ownership  parcel.  All  records  include  address,  census  tract, 
block,  and  other  geographic  information.  The  file  is  updated 
each  year.  City  officials  feel  that  this  is  a  mature  information 
system  with  highly  reliable  contents.  It  has  been  in  use  since 
1973  and  appears  to  be  a  promising  data  source. 

RESULTS^ 

In  this  section,  some  results  are  presented  from  initial 
experimentations  with  the  categorical  data  analysis  method 
(described  at  the  end  of  the  section  "Methods").  These  results 
are  not  considered  to  be  final  in  any  sense;  nevertheless,  they 
are  useful  in  illustrating  the  approach  that  is  being  taken, 
pointing  out  problems  that  are  being  encountered,  and  sug- 
gesting future  directions  for  the  research  to  take. 

Estimation  was  attempted  for  the  city  as  a  whole  and  for  the 
city's  10  communities.  In  both  cases,  the  association  structure 
data  were  taken  from  the  1960  census,  while  the  allocation 
structure  data  were  taken  from  the  1970  census.  The  estimates 
made  were  for  1970;  this  was  done  so  that  the  results  could  be 
verified  and  to  eliminate  sampling  error  as  a  possible  source  of 
estimation  error. 


'The  authors  acknowledge  the  contribution  of  Mr.  Gregg  Diffendal  to 
this  section.  Any  errors  remain  the  authors'  responsibility. 


Four  sets  of  city-level  estimates  were  made.  In  each,  the 
association  structure  was  the  1960  age-by-race-by-sex  table  of 
Denver's  population.  The  following  allocation  structures  were 
used,  consisting  of  four  sets  of  marginal  totals  from  the  1970 
age-by-race-by-sex  table  for  Denver. 

1.  age  and  race  (abbreviated  (A,R)); 

2.  age,  race,  and  sex  (A, R,S); 

3.  age  by  race  (AxR);  and 

4.  age  by  race  and  sex  (AxR,  S). 

The  age  groups  considered  were  0  to  4  years,  5  to  19  years, 
20  to  44  years,  45  to  59  years,  and  60  years  and  over.  These  age 
groups  correspond  as  closely  as  possible  to  those  for  which 
Denver  wants  to  obtain  estimates.  The  race  categories  are  White 
and  non-White. 

The  estimation  results  are  shown  in  table  2,  together  with  the 
correct  population  totals  and  the  percent  errors  of  the  esti- 
mates. 

Use  of  the  1970  (A,R)  marginals  gives  fairly  precise  estimates 
of  the  1970  White  population  under  60.  For  Whites  over  60,  the 
errors  were  over  5  percent,  and  the  other  marginals  tried  do  not 
yield  much  better  results.  The  main  reason  for  this  is  probably 
the  increasing  life  expectancy  for  the  females.  For  the  non- 
White  population,  the  errors  are  quite  large  for  some  categories, 
perhaps  accounted  for  by  the  smaller  magnitude  of  the  actual 
numbers. 

The  inclusion  of  the  sex  marginal  (structure  (A,R,S))  yields 
very  little  improvement.  A  decrease  in  error  for  the  60  years  and 
over  White  population  occurs,  but  the  error  for  the  rest  of  the 
White  population  just  fluctuates  around  that  for  the  (A,R) 
estimate.  For  non-Whites,  the  inclusion  of  the  sex  marginals 
gives  very  little  improvement.  Therefore,  it  seems  that  the 
estimates  for  age  and  race  should  be  obtained  without  the 
inclusion  of  sex. 

If  estimates  of  males  and  females  are  desired,  a  more  fruitful 
approach  might  consist  of  multiplying  by  constants  that  reflect 
the  sex  composition  of  each  age  group. 

Use  of  the  (AxR)  marginals  gives  considerable  improvement 
over  the  (A,R)  estimates,  especially  for  the  population  under  44 
and  for  the  non-White  population  over  60  years  old.  The  largest 
error  is  now  less  than  10  percent.  Knowledge  of  the  age  by  race 
interaction  has  more  importance  here  than  knowledge  of  the  sex 
marginal. 

Use  of  the  (AxR,S)  marginals  reduces  the  largest  errors  while 
increasing  some  of  the  smaller  errors,  especially  in  the  young  age 
groups.  This  appears  to  be  the  effect  of  a  change  in  the  age  by 
sex  interaction  over  the  period  1960  to  1970.  In  particular,  for 
non-Whites  over  60,  the  error  is  now  over  5  percent;  the  small 
number  in  this  age  group  in  Denver  probably  account  for  this. 

In  general,  the  estimates  are  very  good  for  young  ages  and  for 
Whites  under  60  and  are  poorer  for  non-Whites  and  the  old  (60 
years  and  over).  The  large  errors  for  non-Whites  are  probably 
caused  at  least  in  part  by  the  large  increase  in  the  non-White 
population  between  1960  and  1970  and  by  the  rather  small 
non-White  population. 

Three  possible  ways  to  improve  the  estimates  include  not 
using  the  sex  marginal,  aging  the  population,  and  reducing  the 
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Table  2.  Results  From  Application  of  Categorical  Data  Analysis  Method:  City  Level 


Age  structure 


White 


Male 


Female 


Non-White 


Male 


Female 


0TO4YEARS 
(1970  Actual)  .  . 
(A,R) 

(A,R,S) 

(AxR) 

(A  X  R,S) 

(1960  Actual)  .  .  . 
5T0  19YEARS 
(1970  Actual)  .  .  . 
(A,R) 

(A,R,S) 

(AxR) 

(A  X  R,S) 

(1960  Actual)  .  .  . 

20  TO  44  YEARS 

(1970  Actual).  .  . 
(A,R) 

(A,R,S) 


18,329 

18,251.2 
(-0.42) 

17,876.1 
(-2.47) 

18,259.8 
(-0.38) 

17,890.8 
(-2.39) 

30,442 


57,605 

59,872.2 
(3.94) 

58,631.5 
(1.78) 

58,320.9 
(1.24) 

57,134.4 
(-0.82) 

69,313 


73,919 

73,781.8 
(-0.19) 

73,191.3 
(-2.34) 


17,523 

17,583.9 
(0.35) 

17,957.6 
(2.78) 

17,592.2 
(0.39) 

17,961.2 
(2.50) 

29,329 


57,667 

58,466 
(1.39) 

59,698.2 
(3.52) 

57,951.1 
(1.24) 

58,137.6 
(1.82) 

67,685 


80,162 

78,458.3 
(-2.13) 

80,043.7 
(-0.15) 


3,123 

3,085.1 
(-1.21) 

3,022.2 
(-3.23) 

3,076.5 
(-1.49) 

3,014.1 
(3.49) 

2,777 


9,586 

8,028.8 
(-16.24) 

7,863.6 

(-17.97) 

9,547.9 

(-0.40) 

9,349.7 
(-2.47) 

5,016 


9,102 

10,303.6 
(13.20) 

10,083.1 
(10.78) 


2,947 

3,001.8 
(1.86) 

3,066.1 
(4.04) 

2,993.5 
(1.58) 

3,055.9 
(3.70) 

2,702 


9,685 

8,176 
(-15.58) 

8,349.7 

(-13.79) 

9,723.1 

(0.39) 

9,921.3 
(2.44) 

5,108 


10,439 

11,078.3 
(5.12) 

11,303.9 
(8.29) 
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Table  2.  Results  From  Application  of  Categorical  Data  Analysis  Method:  City  Level— Continued 


Age  structure 


White 


Male 


Female 


Non-White 


Male 


Female 


(Ax  R) 

(AxR.S) 

(1960  Actual) 

45  TO  59  YEARS 

(1970  Actual) 

(A,R) 

(A,R,S) 

(Ax  R) 

(A  X  R,S) 

(1960  Actual) 

60  YEARS  AND  OVER 

(1970  Actual) 

(A,R) 

(A,R,S) 

(AxR) 

(AxR,S) 

(1960  Actual) 


74,674 
(1.02) 

73,090.7 
(-1.12) 

86,302 


35,269 

36,182.3 
(2.59) 

35,396.4 
(0.36) 

36,211.6 
(2.67) 

35,434.2 
(0.47) 

41,354 


31,316 

33,120 
(5.76) 

32,344.5 
(3.28) 

33,619.4 
(7.36) 

32,839.9 
(4.87) 

33,659 


79,407 
(-0.97) 

80,990.3 
(1.03) 

91,772 


40,487 

29,512.3 
(-2.41) 

40,304.0 
(-0.45) 

39,544.4 
(-2.33) 

40,321.8 
(-0.41) 

45,160 


45,910 

42,958.9 
(6.43) 

43,743.7 
(4.72) 

43,606.6 
(5.02) 

44,386.1 
(3.32) 

43,658 


9,416.5 
(3.46) 

9,215.8 
(1.25) 

6,504 


3,330 

3,586.3 
(7.70) 

3,508.9 
(5.37) 

3,555.4 
(6.77) 

3,482.6 
(4.58) 

2,212 


2,010 

2,735 
(36.07) 

2,671.4 
(32.91) 

2,183.4 
(8.63) 

2,136.8 
(6.31) 

1,500 


10,124.5 
(-3.01) 

10,325.2 
(-1.09) 

6,993 


3,739 

3,544.1 
(-5.21) 

3,615.7 
(-3.30) 

3,513.6 
(6.03) 

3,586.4 
(-4.08) 

2,186 


2,530 

2,952 
(16.68) 

3,006 
(18.81) 

2,356.6 
(6.85) 

2,403.2 
(5.01) 

1,619 
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structure  of  the  full  associated  table  so  that  some  interactions 
are  assumed  to  be  zero.  Elimination  of  the  sex  marginal  has 
been  discussed  and  will  be  tried  at  a  later  date.  The  age  of  the 
1960  population  could  be  accomplished  through  use  of  a  life 
table  (perhaps  for  Colorado).  Reducing  the  structure  of  the  full 
table  may  help  to  reduce  certain  errors  in  the  table.  Although 
the  3-factor  interaction  probably  is  significant,  significance  may 
only  be  due  to  the  large  overall  number  of  observations.  An 
examination  of  the  parameters  in  the  "saturated"  (full)  log- 
linear  model  may  help  in  deciding  which  interactions  to 
eliminate. 

The  categorical  data  analysis  method  was  also  tried  at  the 
subcity  level.  The  variable  of  interest  was  the  1970  population 
in  each  age  group  for  each  community.  The  association 
structure  was  taken  to  be  either  the  1960  age-by-race-by-sex 
table  of  total  population  for  Denver's  10  communities  or  the 
1960  age-by-sex  table.  The  allocation  structure  in  these  experi- 
ments included  the  1970  age-by-race-by-sex  totals  and  the  1970 
age-by-sex  totals  for  the  city  as  a  whole.  In  addition,  for  two 
applications  of  the  method,  it  was  assumed  that  the  1970  total 
population  figures  for  each  community  were  available. 

The  results  of  applying  Purcell's  method  are  summarized  in 
table  3.  In  general,  the  estimates  can  be  described  as  inaccurate, 
but  a  closer  look  reveals  reasons  for  the  inaccuracy  and  some 
suggestions  for  the  future  direction  of  the  research. 

The  estimation  is  inaccurate  for  several  reasons.  First,  the 
method  assumes  that  the  interaction  between  age,  race,  and  sex 
and  the  communities  remains  constant  from  1  960  to  1 970.  This 
is  clearly  not  true;  it  is  even  more  emphatically  not  true 
between  1970  and  1980  (e.g.,  because  young  professionals 
began  moving  into  certain  central  city  neighborhoods  during  the 
1970's  and  because  the  life  expectancy  of  women  has  continued 
to  increase). 

A  second  reason  for  the  inaccuracy  of  estimation  is  that  the 
associated  variables  (sex  and  race)  probably  are  not  sufficiently 
related  to  the  age  distribution  of  the  population  to  allow 
accurate  estimation.  The  sex  distribution  of  the  population 
probably  has  some  effect  on  the  age  distribution  (because 
females  tend  to  live  longer  than  males),  but  the  race  distribution 
appears  not  to  have  much  effect  on  the  age  distribution.  In  fact, 
comparison  of  the  errors  in  table  3  reveals  that  the  use  of  race  in 
Purcell's  estimation  method  makes  the  results  slightly  less 
accurate  than  if  race  is  not  used.  (Compare  structure  Ai  with 
B,  and  A2  with  B2.)  This  possibility  was  raised  by  Purcell 
(1979,  p.  90).  Thus,  if  Purcell's  method  is  to  be  applied 
successfully  at  the  subcity  level,  more  associated  variables  are 
needed  that  are  strongly  related  to  the  variable  of  interest. 

Comparisons  of  structure  A]  with  A2  and  of  structure  Bi 
with  B2  in  table  3  also  reveal  that  the  use  of  accurate  current 
estimates  of  the  population  of  the  small  domains  (i.e.,  Denver's 
communities)  can  greatly  aid  in  estimation.  It  is  important  to 
note  that  these  extra  data  should  be  reasonably  accurate;  if  they 
are  not,  the  results  can  actually  be  made  worse.  (Purcell,  1979, 
p.  90).  Therefore,  it  will  be  desirable,  though  difficult,  to  find 
some  reasonably  accurate  symptomatic  data  for  Denver's 
communities.  Local  data  will  be  especially  vital  in  this  effort. 


Finally,  that  in  all  the  applications  in  this  paper,  the  marginal 
totals  are  the  correct  marginals  from  the  1 970  census;  usually  in 
a  real-world  application,  data  from  a  recent  sample  survey  are 
used  for  the  marginals.  Therefore,  the  sample  estimates  are 
likely  to  contain  sampling  errors,  which  may  increase  the 
estimation  errors.  But  note  also  that  the  estimates  are  calculated 
for  a  period  10  years  ahead  of  the  association  structures, 
perhaps  the  maximum  needed  to  estimate  the  population  (after 
11  or  12  years,  actual  census  data  should  be  available). 
Therefore,  the  large  estimation  errors  are  not  discouraging. 

CONCLUSIONS  AND  RECOMMENDATIONS 

This  paper  has  outlined  an  approach  to  attempt  to  make 
recent  developments  in  estimating  socioeconomic  data  items 
available  to  Denver  and,  it  is  hoped,  to  other  cities.  The 
approach  consists  of  attempting  to  apply  estimation  and 
projection  techniques  with  existing  Federal,  State,  and  local 
data  from  censuses,  surveys,  and  administrative  registers. 

Currently,  the  project  is  in  the  first  stages  of  experimentation 
with  the  various  estimation  methods,  and  some  of  these 
experiments  have  been  reported  here  to  give  an  illustration  of 
the  research  approach.  The  first  method  that  has  been  tried  is 
the  categorical  data  analysis  approach,  in  which  large-domain 
totals  are  allocated  to  smaller  domains  using  a  log-linear  model. 
The  results  of  this  experiment  can  be  used  to  suggest  some 
future  directions  the  research  should  take.  First,  it  is  clear  that 
more  associated  variables  must  be  found  to  provide  improved 
symptomatic  indicators  that  can  help  in  estimating  values  of  the 
variables  of  interest.  In  addition,  as  Purcell  indicates  in  his 
dissertation,  the  use  of  variables  that  are  not  strongly  related  to 
the  variables  of  interest  can  actually  make  estimation  results 
worse.  Some  possible  variables  that  might  improve  the  esti- 
mation results  include  data  from  the  Annual  Housing  Survey 
(e.g.,  income  when  estimating  the  racial  distribution  of  the 
population)  or  from  city  administrative  registers  (e.g.,  the 
average  value  of  the  housing  stock  or  the  number  of  welfare 
recipients).  It  is  clear  that  local  data  will  play  a  major  role  here. 

Even  in  the  crude  analysis  summarized  here,  possession  and 
use  of  data  on  the  small  domains  (e.g.,  Denver's  ten  commu- 
nities) can  drastically  reduce  estimation  error.  Unfortunately, 
such  data  are  difficult  to  get,  but  Denver's  1978  housing  survey 
and  some  other  administrative  data  the  city  is  providing  may 
provide  some  of  this  needed  data. 

Finally,  it  must  be  emphasized  that  one  of  the  most  difficult 
aspects  of  small-area  estimation  is  validation  of  the  methods 
that  are  used.  Generally,  validation  is  only  possible  if  the  correct 
values  of  the  variable  of  interest  are  already  known,  which  is 
often  the  case  only  in  census  years.  Since  the  results  of  the  1 980 
Census  of  Population  will  be  available  in  late  1981  or  early 
1982,  the  Census  Bureau  and  the  City  of  Denver  are  now  in  a 
good  position  to  validate  the  estimation  and  projection  methods 
that  are  being  tried  in  this  project.  By  the  time  the  data  are 
available  for  validation  of  the  methods,  the  Bureau  and  the  city 
will  have  ready  the  data  that  are  necessary  for  application  of  the 
methods  and  some  experience  in  using  them. 
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Table  3.  Estimation  of  Age  Distribution  of  Population  in  Denver's  Comnnunities:  Preliminary  Results 


Community  and  age 


Actual 
population' 


Structure  A 


Estimates 


Percent 
error 


Structure  A, 


Estimates'' 


Percent 
error 


Structure  B, 


Estimates 


Percent 
error'' 


Structure  B, 


Estimates 


NW  (Northwest)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

NC  (North  Central)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

NE  (Northeast)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

EC  (East  Central)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

SE  (Southeast)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

SC  (South  Central)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 


5,169 
16,387 
18,158 
11,255 
13,769 


6,022 
16,781 
19,447 

9,065 
10,599 


4,210 

13,239 

14,439 

5,097 

3,884 


3,023 

10,870 

16,607 

8,908 

8,468 


3,625 

11,846 

14,624 

6,856 

2,621 


3,821 
14,505 
19,950 
11,707 
13,080 


5,170 
17,614 
21,768 
12,882 
13,901 


9,742 
28,980 
33,546 
14,813 
14,199 


3,034 

10,239 

12,775 

6,344 

5,251 


2,577 

11,200 

15,535 

7,103 

5,499 


2,714 
7,977 
10,085 
2,937 
1,508 


3,193 
12,804 
16,657 
11,226 
11,238 


0.00 

7.49 

19.88 

14.46 

0.96 


61.77 
72.70 
72.50 
63.41 
33.97 


-27.93 

-22.66 

-11.52 

24.47 

35.20 


-14.75 

3.04 

6.46 

-20.26 

-35.06 


-25.13 
-32.66 
-31.04 
-57.16 
-42.46 


-16.44 
-11.73 
-16.51 
-4.11 
-14.08 


4,702 
15,997 
19,524 
11,649 
12,865 


6,133 

18,243 

20,298 

9,127 

8,113 


3,325 

11,132 

13,831 

6,865 

5,716 


2,958 

12,712 

17,709 

8,096 

6,400 


4,268 

12,468 

15,678 

4,695 

2,464 


3,643 
14,551 
18,860 
12,853 
13,155 


-9.03 

-2.38 

7.52 

3.50 

-6.57 


1.84 
8.71 
4.38 
0.68 
-23.46 


-21.02 

-15.92 

-4.21 

34.69 

47.17 


-2.15 

16.95 

6.63 

-9.11 

-24.42 


17.74 

5.25 

7.21 

-31.52 

-5.99 


-4.66 
0.32 

-5.46 
9.79 
0.57 


5,472 
18,768 
22,621 
13,348 
14,115 


7,996 
21,121 
28,561 
12,506 
13,179 


3,158 

10,732 

13,063 

6,501 

5,322 


2,720 

11,976 

15,866 

7,347 

5,580 


2,894 
8,608 
10,536 
3,028 
1,526 


3,410 
13,816 
17,385 
11,664 
11,432 


5.86 
14.53 
24.58 
18.60 

2.51 


32.98 
25.86 
46.87 
37.96 
24.34 


-24.99 

-18.94 

-9.53 

27.55 

37.02 


-10.02 

10.17 

-4.46 

-17.52 

-34.10 


-20.17 
-27.33 
-27.95 
-55.83 
-41.78 


-10.76 
-4.75 

-12.86 
-0.37 

-12.60 


4,775 
16,258 
19,559 
11,657 
12,489 


5,884 

16,158 

20,822 

9,210 

9,840 


3,339 

11,263 

13,683 

6,879 

5,705 


3,005 

13,128 

17,368 

8,122 

6,252 


4,329 

12,780 

15,614 

4,535 

2,319 


7,337 
15,012 
18,855 
12,776 
12,687 


53 


Table  3.  Estimation  of  Age  Distribution  of  Population  in  Denver's  Communities:  Preliminary  Results— Continued 


Community  and  age 


Actual 
population' 


Structure  Ai 


Estimates 


Percent 
error 


Structure  A, 


Estimates 


Percent 
error 


Structure  B, 


Estimates 


Percent 
error'' 


Structure  B2 


Estimates 


Percent 


error 


SW  (Southwest)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

WC  (West  Central)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

C  (Central)  community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

CBD  (Central  Business  District) 
community 

0  to  4  years 

5  to  19  years 

20  to  44  years 

45  to  59  years 

60  years  and  over 

MAPE* 

RAE' 


4,881 

19,944 

18,819 

9,372 

3,682 


8,327 
22,357 
23,413 
10,484 
10,030 


2,824 
8,513 

27,414 
9,319 

14,570 


20 
101 
751 
762 

1,063 


(X) 


X) 


6,110 

17,399 

21,262 

5,079 

2,964 


6,693 
20,634 
23,169 
10,435 

9,570 


2,354 

7,545 

17,720 

10,789 

15,753 


34 

151 

1,103 

1,216 

1,883 

(X) 

(X) 


25.18 
-12.76 

12.98 
-45.81 
-19.50 


-19.62 
-7.71 
-1.04 
-0.47 
-4.59 


-16.64 

-11.37 

-35.36 

15.77 

8.12 


50.00 
49.50 
46.87 
59.58 
77.14 

26.8 

(0.0, 
77.14) 


6,573 

18,716 

22,629 

5,493 

3,288 


7,543 
21,925 
24,105 
10,872 
10,165 


2,753 

8,695 

20,303 

12,430 

18,459 


24 

104 

684 

744 

1,140 

(X) 

(X) 


34.67 

-6.16 

20.25 

-41.39 

-10.70 


-9.42 

-1.93 

2.96 

3.70 

1.35 


-2.51 

2.14 

-25.94 

33.38 

26.69 


20.00 

2.97 

-8.92 

-2.36 

7.24 

14.5 

(0.32, 
47.17) 


6,520 

18,708 

22,218 

5,271 

3,010 


7,244 
21,623 
23,906 
10,775 

9,705 


2,476 

8,053 

18,373 

11,167 

16,004 


31 

138 

1,092 

1,216 

1,893 

(X) 

(X) 


33.58 
-6.20 

18.06 
-43.76 
-18.25 


-13.01 

-3.28 

2.11 

2.78 

-3.24 


-12.32 
-5.40 

-32.98 

19.83 

9.84 


55.00 
36.63 
45.40 
59.58 
78.08 

22.8 

(0.37, 
78.08) 


6,669 

18,993 

22,515 

5,397 

3,125 


7,402 
21,931 
24,200 
11,019 
10,059 


2,767 

8,935 

20,342 

12,483 

18,112 


19 

84 

664 

748 

1,182 

(X) 

(X) 


36.63 
-4.77 

19.64 
-42.41 
-15.13 


-11.11 

-1.91 

3.36 

5.10 

0.29 


-2.02 

4.96 

-25.60 

33.95 

24.31 


-5.00 

-16.83 

-11.58 

-1.84 

11.19 

12.3 

(0.29, 
46.88) 


*  Obtained  from  1970  census,  second  count,  file  A,  table  2. 

Structure  Aj  consists  of  (a)  Variable  of  interest:  1970  age  distribution  of  population  for  the  ten  communities;  (b)  Association  structure:  1960  age, 
race,  and  sex  distribution  of  the  population  for  the  ten  communities;  (c)  Allocation  structure:  age,  race,  and  sex  distribution  for  the  city  as  a  whole. 
Structure  A2  is  the  same  as  structure  A,  except  that  in  the  allocation  structure,  1970  population  totals  for  the  ten  communities  are  known. 
Structure  A3  consists  of  (a)  Variable  of  interest:  1970  age  distribution  of  the  population  for  the  ten  communities;  (b)  Association  structure:  1960 
age  and  sex  distribution  of  the  population  for  the  ten  communities;  (c)  Allocation  structure:  1970  age  and  sex  distribution  for  the  city  as  a  whole. 
Structure  A4  is  the  same  as  structure  A3  except  that  in  the  allocation  structure,  1970  population  totals  for  the  ten  communities  are  known. 
'MAPE  Mean  (average)  Absolute  Percentage  Error. 
RAE  Range  of  Absolute  Percentage  Error  (=  maximum  absolute  error  -  minimum  absolute  error). 
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Dodge  Local  Construction  Potentials 

John  H.  Morawetz 
McGraw-Hill  Information  Systems  Company 


It  is  not  often  that  a  new  statistical  series  is  commenced  in 
the  United  States.  Even  more  infrequently  do  we  hear  of  new 
series  introduced  by  the  private  sector  of  the  economy.  It  is, 
therefore,  an  unusual  privilege  for  me  to  tell  you  about  a  new 
construction  statistics  series  that  was  recently  introduced  by 
the  McGraw-Hill  Information  Systems  Company. 

The  data  base  underlying  this  new  series  has  been  in  ex- 
istence for  some  time.  Yet,  the  specific  series  is  new  and  will 
be  known  as  Dodge  Local  Construction  Potentials. 

Computer  technology  has  made  it  economically  feasible  to 
generate  statistics  by  31  categories  of  construction  for  any 
county  or  group  or  counties  in  the  50  States  on  a  monthly 
basis. 

Prior  to  an  explanation  of  the  specific  output  configurations, 
I  want  to  familiarize  you  with  the  nature  of  the  data  inputs.  The 
F.  W.  Dodge  Division  of  the  McGraw-Hill,  Inc.,  is  a  national 
construction  information  collection  organization  consisting  of 
about  400  full-time  and  1,400  part-time  reporters/enumerators. 
Each  is  assigned  a  unique  geographic  area  the  size  of  which 
varies  with  construction  activity.  It  is  this  field  organization's 
primary  responsibility  to  report  construction  related  activities 
on  all  construction  projects,  new,  and  major  additions  and 
alterations,  until  construction  on  each  project  is  actually  started. 
This  includes  the  gathering  of  information  from  architects, 
contractors,  major  owners,  building  departments,  etc.,  from  the 
earliest  structural  design  intentions,  through  competitive  bid- 
ding, the  awarding  of  contracts,  the  floating  of  bond  issues,  and 
other  pre-construction-start  events. 

This  information  is  not  primarily  collected  for  statistical 
services.  The  Dodge  Division  sells  this  information,  known  as 
Dodge  Reports,  in  the  marketplace.  Statistical  entries  are  made 
from  these  reports  when  each  structure  is  about  to  start. 

Projects  valued  at  under  325,000  are  intentionally  excluded 
from  this  information/data  base.  In  addition,  the  Dodge  data 
are  also  subject  to  the  traditional  "census"  voids  since  not  all 
projects  are  captured.  There  is  a  selection  bias  regarding  those 


excluded.  Small  projects,  those  in  remote  areas,  pre-engineered 
or  prefabricated  structures,  and  those  built  by  the  owner's 
own  forces  are  more  readily  missed  than  others.  While  there  is  no 
continuing  measure  of  these  voids,  there  is  empirical  evidence 
that  they  account  for  10  to  15  percent  of  the  new  construction 
universe  and  that  this  void  is  fairly  stable  over  time. 

While  the  Dodge  Reporters/Enumerators  also  collect  one- 
and  two-family  house  information,  all  house  data  are  based 
upon  building  permits  and  on  a  sampling  effort  in  areas  not 
requiring  building  permits. 

The  just-described  statistical  information  is  generated  each 
month  for  each  county  in  the  United  States,  for  each  of  209 
structure  categories.  A  variety  of  services  has  been  made  avail- 
able from  this  data  base  for  some  time,  both  to  the  Government 
and  to  the  private  user. 

The  new  service  has  been  made  possible  and  economically 
feasible  through  the  use  of  a  leaser  printer,  xerox  No.  9700. 
A  standardized  format  is  used  for  all  reports  in  all  area  of  the 
country.  It  provides  for  monthly  data  by  19  building  and 
12  nonbuilding  categories  for  the  previous  month,  the  cumula- 
tive to  date,  and  comparable  data  for  the  prior  year.  Construc- 
tion is  measured  in  current  dollars,  square  footage  of  floor 
area,  number  of  dwelling  units,  and  number  of  projects.  Only 
the  geography  is  varied.  These  data  can  be  generated  for  any 
county,  any  group  of  counties,  such  as  SMSA  or  BEA  areas, 
States,  groups  of  States,  etc. 

All  information  is  identically  offset  on  8  1/2  by  11  inch 
output  formats  and  is  made  available  to  users  approximately 
30  days  after  the  end  of  each  reporting  period.  (Specimens  of 
the  service  are  being  distributed  to  the  audience.'  ) 

The  new  data  service  reflects  the  recognition  that  construc- 
tion is  an  important  component  of  all  economic  activity.  This 


See  exhibit. 
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quantitative  measure  will  be  particularly  appreciated  because  of 
its  availability  for  any  geographic  configuration.  It  is  expected 
that  Dodge  Local  Construction  Potentials  will  improve  the 
management/marketing  function  of  dealers,  distributors,  whole- 
salers, etc.,  who  sell  a  construction  related  product  or  service 
in  a  fairly  compact  area  through  a  sales  force.  Through  it  they 
will  better  be  able  to  align  their  sales  territories,  and  understand 
their  changing  market  potentials  and  market  penetrations.  It 
will  help  in  the  planning  of  inventories,  cash  flow,  and  the 
location  of  warehouses  and  manufacturing  facilities. 


Additional  uses  for  small-areas  construction  statistics  will 
be  made  by  public  utilities,  financial  institutions,  local  and 
regional  governments,  all  of  which  have  a  variety  of  reasons 
for  measuring  in  their  particular  geographic  configurations  the 
changing  patterns  of  the  construction  component. 

This  new  service  was  commenced  a  few  short  months  ago,  in 
April  1980.  Undoubtedly,  yet  unanticipated,  additional  uses 
will  be  made  of  this  new  service. 

Thank  you  for  permitting  me  to  outline  Dodge  Local  Con- 
struction Potentials  for  you  and  for  your  attention. 
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SUMMARY  OF  CONSTRUCTION  CONTRACTS  FOR  NEW,  ADDITION  &  MAJOR  ALTERATION  PROJECTS 


SQUARE    FEET    AND    VCLUE    IN    THOUSANDS 
CURRENT   MONTH 


Stoit    &    Olhrt    Mfrcinlilr    BIdgs 
Mlltkoilts   lEicI     Mil     Ownidl 
Olticf   k   Biiik   Bldgs 
Comncfciil   GariQes   &    Service   Siflions 
TOTAL    COMMERCIAL    BLDGS 


PRDJS 
5 
i 
» 
0 
19 


THIS    YEAR 
I    SQ     KET  I 


83 

0 
256 


VALUE 
*,555 
765 
4,042 
0 
9,362 


LAST  YEAR 
I  VALUE 
1.624 
1,«0 
2,103 
57 
5,704 


NUMBER    OF  PROJECTS 

THIS    YEAR    I  LAST    YEAR 

104  152 

63  92 

110  140 

18  24 

295  408 


CUMULATIVE    TO    DATE    ("I 

SQUARE    FEET  I 

THIS    YEAR  I     LAST    YEAR  |  THIS 


1,001 
450 
514 
410 

2,375 


689 
♦21 
753 
328 
2,691 


YEAR 
,278 
i511 
,780 
1 740 
,309 


VALUE 
I  LAST 
26 
15 
33 
6 
81 


YEAR  I       ' 

,254 

,191 

,883 

,271 

,599  «» 


Miiilictuting   Plants 
Wirahoisas-IMIi     Oxnedl 
Labotaioiies  IMfc    Owned) 
TOTAL    MANUFACTURING    BLDGS 


121 
46 
70 

237 


3,477 

910 

2,250 

6,637 


2,897 

690 

0 

3,587 


51 

11 

3 

65 


69 

31 

1 

101 


835  1,047 

129  428 

370  0 

1,334  1,475 


34, 

2, 

91, 

128, 


185 
876 
150 
211 


,221 
,488 
,000 
,709 


School   &   College    Classroom  BI09S. 
Laboralories   (Eicl     Mir     Owned) 
Libraries    Museums,   etc. 
TOTAL    EDUCATIONAL    i    SCIENCE    BLDGS 


,185 
0 
0 

,185 


193 

0 

1.057 

1.250 


40 
0 

4 
44 


38 
1 
3 

42 


194 

0 

11 

205 


311 

3 

21 

335 


,179 

0 

714 

.893 


,743 
192 
,324 
,259  -34 


TOTAL    HQSP     &    HEALTH    TREAT 

BLDGS. 

1 

12 

480 

110 

21 

25 

267 

416 

21,676 

48,271     -55 

Covernenl   Admioislialion   BIdgs 

0 

0 

0 

0 

6 

10 

7 

21 

971 

1.683 

Oilier   Conermenl   Service   BIdgs. 

1 

0 

45 

110 

5 

20 

4 

46 

290 

4.467 

TOTAL    PUBLIC    BIOGS 

1 

0 

45 

110 

11 

30 

11 

67 

1,261 

6.350     -80 

Houses  ol    Worship 

1 

3 

189 

933 

18 

20 

80 

149 

3,350 

5,074 

Olher   Religious   BIdgs 

0 

0 

0 

0 

9 

S 

29 

67 

1,948 

1.994 

TOTAL    RELIGIOUS    BLDGS 

1 

3 

189 

933 

27 

26 

109 

21« 

5,298 

7,0*8     -25 

TOTAL    AMUSE  T.  RECREAT.&COMMUN. BLDGS 

3 

9 

543 

340 

38 

46 

412 

222 

21,800 

8,344  ++ 

TOTAL    MISCELLANEOUS    NONRES 

BLDGS 

5 

20 

524 

341 

25 

21 

311 

186 

8,433 

3.666   ^-^ 

TOTAL   NQN    RESIDENTIAL    BUILDINGS 

40 

537 

18,965 

12,375 

526 

699 

5,024 

5,608 

286.881 

218,266     +31 

Oie-Famrly   Houses 

204 

351 

13 

111 

15 

388 

2 

232 

4 

,217 

3 

857 

8 

188 

141.327 

2*0,515 

Two-Famrly   Houses 

15 

36 

1 

133 

2 

128 

324 

281 

838 

707 

25,875 

19,514 

Apaniaeni   BIdgs 

11 

63 

2 

275 

2 

266 

198 

216 

2 

055 

2 

242 

66,741 

60,218 

TOTAL    HQUSEKPG     RESIDENTIAL    BIOGS.    (al 

230 

450 

16 

519 

19 

782 

2 

754 

4 

,714 

6 

750 

11 

137 

233.943 

340,247 

-31 

Hotels  t   Moiels 

0 

0 

0 

0 

1 

10 

0 

94 

100 

10,109 

Dornilories 

0 

0 

0 

400 

2 

7 

3 

12 

150 

1,892 

TOTAL    NONHOUStKPG     RESIDENTIAL   BLDGS. 

0 

0 

0 

♦00 

3 

17 

3 

i06 

250 

12,001 

-98 

TOTAL    RESIDENTIAL    BUILDINGS 

230 

450 

16 

519 

20 

182 

2 

757 

4 

731 

6 

753 

11 

243 

234,193 

352.248 

-34 

TOTAL    BUILDING    <Res     &    NonRes  ) 

270 

987 

35 

484 

32 

557 

3 

283 

5 

430 

11 

777 

16 

851 

521,074 

570.514 

-  9 

Streets  &    Highways 

5 



412 

3 

659 

141 

154 





23,324 

31,425 

Bridges  Unci     Elev    Highways   &   Railways) 

0 



0 

1 

805 

14 

10 





27,802 

2,904 

Dams  &   Reservoirs 

0 



0 

0 

1 

2 





129 

44 

River   i   Harbor    Dev    (Eicl     Dams   (   Res.) 

2 



67 

1 

895 

41 

36 





2,821 

4,SS6 

Sewerage   &   Waste   Disposal   Systems 

0 



0 

27 

25 

42 





5,849 

34.769 

Waler   Supply   Systems   (Excl     Dams  &   Res.) 

0 



0 

622 

25 

37 





7,026 

8.834 

Elec    Power   &    Hig    Sysl    (Excl     Dams   &   Res 

)      0 



0 

23 

5 

8 





294 

3,048 

Gas  Systems  (Natural   k   Manufactured) 

0 



0 

0 

4 

1 



— 

801 

5* 

Cofflmanicalion    Systems 

0 



0 

118 

1 

3 





50 

136 

Missile  &   Space   Facilities 

0 



0 

0 

0 

0 





0 

0 

Airports  <Eicl     BIdgs) 

0 



0 

0 

5 

4 





121 

»17 

Mrsccllateous  Non-Building   Construction 

3 



1 

292 

359 

70 

94 





5,961 

8,187 

TOTAL    NON-BUILOING    CONSTRUCTION 

10 

— 

1 

771 

8 

508 

332 

391 

— 

— 

74,178 

95,206 

-22 

TOTAL    CONSTRUCTION 

280 

987 

37 

255 

41 

065 

3 

615 

5 

,821 

11 

777 

16 

851 

595,252 

**5,720 

-11 

la)    NUMBER    OF    DWELLING    UNITS    IN 
HOUSEKEEPING    RESIDENTIAL    BUILDINGS- 


Oae  Family  Houses 
Two  Family  Houses 
Apartmtal   BIdgs. 

TOTAL    NUMBER   OF    DWEUIN6   UNITS. 


CURRENT    MONTH 


CUMULATIVE    TO    DATE 


THIS    YEAR  I  LAST    YEAR  |  THIS    YEAR  JLAST    YEAR  |     % 


204 
30 
54 

288 


251 
«* 
117 


2.231 

648 

1.822 


4,213 
5*2 

2,101 


434         4,701         4.876 


-47 
+  15 
-13 


NOTCS        The   f 


CopY'igfi'    ®    198  1    McGriw- 

Tn.i    rw«M     .(   cox'.Mfil.BI     «mr«*^.< 
frmMM  eni*  br  cwiiract  •'  ptio'    writi 


Synthetic  Estimates  for  Local  Areas  From 
the  Health  Interview  Survey* 

Ralph  DiGaetano,  Westat,  Inc. 
Ellen  MacKenzie,  Johns  Hopkins  University 
Joseph  Waksberg,  Westat,  Inc. 
Richard  Yaffe,  Health  Care  Financing  Administration 


INTRODUCTION 

Under  the  National  Health  Planning  and  Resources  Develop- 
ment Act  of  1974,  Health  Systems  Agencies  are  required  to 
develop  a  5-year  Health  Services  Plan  (HSP)  that  covers  overall 
goals  and  long-range  objectives,  and  an  Annual  Implementation 
Plan  (AlP)  that  outlines  specific  activities  for  the  coming  year. 
To  carry  out  these  activities,  the  HSA's  seek  relatively  recent 
and  reliable  data  on  the  health  status  and  needs  of  the 
community  as  well  as  about  their  patterns  of  health  services 
utilization.  The  Planning  Act  specifically  requires  the  HSA's  to 
collate  and  analyze  data  which  are  currently  available. 

One  source  of  data  relevant  to  the  planning  activities  of  the 
HSA's  that  has  generated  interest  is  the  NCHS  Health  Interview 
Survey  (HIS).  The  Health  Interview  Survey  (HIS)  collects  data 
from  a  continuing  nationwide  probability  sample  of  the 
Nation's  households.  Information  is  available  concerning  illness, 
injuries,  impairments,  disability,  and  the  utilization  of  health 
services  for  the  civilian,  noninstitutionalized  population  of  the 
United  States.  The  sample  design  and  size  (approximately 
40,000  households  per  year),  however,  permit  reliable  estimates 
to  be  calculated  only  for  the  United  States  as  a  whole,  for  four 
broad  geographic  regions  and  perhaps  for  certain  large  standard 
metropolitan  statistical  areas  (SMSA's).  The  sample  size  is  not 
sufficient  to  allow  reliable  estimates  to  be  made  on  health 
variables  for  most  HSA's  or  sub-HSA  areas.  This  problem  has 
been  recognized  for  some  time,  and  there  has  been  considerable 
developmental  work  done  on  statistical  procedures  that  can  be 
used  to  develop  estimates  for  small  areas  using  national  data 
sources.  One  such  procedure,  synthetic  estimation,  has  been 
used   by    NCHS  to  develop  State  estimates  of  disability  and 


utilization  of  medical  services  from  the  HIS  data.'  Other 
analysts  have  used  multiple  regression  to  generate  small-area 
data.^  The  utility  of  these  statistical  methods  for  health 
planning  at  the  local  (HSA)  level  remains  largely  untested. 
Neither  synthetic  nor  regression  estimates  applied  to  local  areas 
are  unbiased,  and  the  extent  to  which  they  are  biased  will  affect 
their  utility  for  planning  purposes. 

The  National  Center  for  Health  Statistics  is  engaged  in 
assessing  the  applicability  of  these  techniques  for  imputing 
estimates  of  HIS  variables  from  national  or  regional  data  for 
small  areas  and  has  awarded  a  contract  for  such  an  evaluation  to 
the  Health  Services  Research  and  Development  Center  of  the 
Johns  Hopkins  Medical  Institutions.  Westat,  Inc.,  is  acting  as  a 
subcontractor  to  Johns  Hopkins  for  the  study.  This  paper 
contains  a  description  of  the  methods  used  to  evaluate  the 
estimating  techniques  and  a  preliminary  analysis  of  the  results 
available  to  date.  This  includes  an  evaluation  of  the  quality  of 
synthetic  and  regression  estimates  through  an  examination  of 
the  HIS  data  alone.  In  addition,  the  paper  contains  further 
assessments  of  these  estimates  made  through  comparisons  with  a 
random  digit  dialing  telephone  survey  of  about  2,500  house- 
holds in  the  Baltimore  SMSA.  The  phone  survey  was  carried  out 
in  conjunction  with  this  project. 


•Supported  in  part  by  Contract  No.  223-78-2052  from  the  National 
Center  for  Healthi  Statistics. 


'  National  Center  for  Healtfi  Statistics:  Synthetic  State  Estimation  of 
Disability.  PHS  Publication  #1759,  Public  Health  Service,  Washington, 
D.C. 1968. 

Namekata,  T.,  Levy,  P.  S.,  and  O'Rourke,  T .\N .;  Synthetic  Estimates 
of  Worl<  Loss  Disability  for  Each  State  and  the  District  of  Columbia. 
Public  Health  Report  90:532-538,  1975. 

National  Center  for  Health  Statistics:  "Synthetic  Estimation  of  State 
Health  Characteristics  based  on  the  Health  Interview  Survey,"  by  P.  S. 
Levy  and  D.  K.  French.  Vital  and  Health  Statistics.  Series  2,  No.  75. 
DHEW  Pub.  No.  (HRA)  78-1349.  Health  Resources  Administration, 
Washington,  D.C,  U.S.  Govt.  Printing  Office,  October  1977. 

^Gonzalez,  M.  E.  and  Hoza,  C:  "Small-Area  Estimation  with 
Applications  to  Unemployment  and  Housing  Estimates,"  Journal  of  the 
American  Statistical  Association,  73,  1978. 
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METHODS-COMPUTING  SYNTHETIC  AND 
REGRESSION  ESTIMATES 

The  technique  of  synthetic  estimation  involves  applying 
national  or  regional  estimates  of  the  characteristic  being 
measured  for  specific  population  subgroups  to  the  local  area's 
population  composition.  The  simplest  form  of  synthetic  estima- 
tion, and  the  one  for  which  the  name  is  usually  reserved, 
requires  computation  of  a  weighted  average  of  the  mean  values 
of  the  characteristics  in  the  subgroups  with  weights  that  are 
proportional  to  the  distribution  of  the  subgroups  in  the 
small-area  population.  A  more  general  approach  involves  regres- 
sion analysis.  In  this  approach,  the  national  or  regional  data  are 
used  to  estimate  a  regression  equation  which  relates  the 
independent  variables  which  define  the  population  subgroups  to 
the  characteristic  of  interest.  The  values  of  the  regression 
variables  for  the  small  area  are  then  used  in  the  equation  to 
obtain  estimates  of  the  characteristic  for  that  small  area. 

In  the  study  discussed  in  this  paper,  both  techniques 
described  above  have  been  used  to  derive  estimates  for  40  key 
health  variables  selected  from  the  basic  HIS  questions  and  the 
supplemental  questions  for  1978.  The  selection  of  variables  was 
dictated  by  data  requirements  of  the  HSA  and  the  need  to  have 
an  adequate  range  of  different  types  of  variables  for  which  the 
use  of  synthetic  estimation  could  be  evaluated. 

Simple  synthetic  estimates  for  these  40  dependent  variables 
were  derived  for  a  basic  set  of  demographic  variables  (age,  sex, 
and  race).  Additional  independent  variables  used  to  obtain 
regression  estimates  are  of  two  types:  variables  that  are  only 
available  in  census  years  or  for  which  estimates  might  be 
available  from  other  surveys  or  an  inexpensive  survey,  and 
variables  obtained  from  the  Area  Resource  File.  Variables  used 
in  the  present  analysis  include  proportion  of  people  in  a  PSU  in 
households  where  the  head  of  household  completed  high  school, 
proportion  of  persons  in  a  PSU  who  are  heads  of  households 
and  also  are  either  farm  or  blue  collar  workers,  proportion  of 
persons  in  a  PSU  over  65,  proportion  of  non-White  population 
in  a  PSU,  per  capita  income  in  a  PSU,  number  of  hospital  beds 
per  100,000  population  in  a  PSU,  proportion  of  people  in  a  PSU 
who  are  over  17  years  old  and  married,  and  number  of  M.D.'s 
per  100,000  population  in  a  PSU.  Estimates  were  prepared  using 
all  356  Primary  Sampling  Units  (PSU's)  in  the  National  HIS  and 
have  been  applied  to  the  6  counties  in  the  Baltimore  SMSA  as 
well  as  the  20  largest  SMSA's. 

EVALUATION  METHODS 

Three  methods  are  being  used  to  evaluate  the  various 
estimates: 

•  Comparison  of  the  results  with  a  telephone  survey  in  the 
Baltimore  HSA  (and  counties  within  it)  with  the  various 
synthetic  or  regression  estimates  for  the  same  areas; 

•  Comparison  of  the  synthetic  and  regression  estimates  for 
individual  PSU's  with  the  direct  HIS  estimates  for  the  same 
areas,  and 

•  Calculation  of  average  mean  square  errors  of  the  synthetic 
and  regression  estimates. 


Comparison  with  Telephone  Survey 

Since  the  telephone  survey  concentrated  on  a  group  of  items 
that  were  also  collected  in  HIS,  direct  comparisons  are  possible 
for  synthetic  and  regression  estimates  of  these  items  with 
statistics  for  the  same  items  from  the  telephone  survey.  This  is  a 
straightforward  method  of  evaluation.  For  each  item  studied, 
the  comparison  with  the  survey  estimate  serves  as  a  guide  to  the 
accuracy  of  the  synthetic  or  regression  estimate.  Although  such 
a  comparison  provides  important  information  on  the  accuracy 
of  the  estimate,  it  is  subject  to  several  limitations.  First,  it 
assumes  that  HIS  results  are  comparable  to  those  from 
telephone  interviewing  (more  specifically,  the  particular  proce- 
dures used  in  the  Baltimore  telephone  survey).  Secondly,  the 
time  periods  are  different.  There  are  sizeable  seasonal  variations 
for  some  of  the  statistics  which  complicate  the  comparisons. 
Finally,  such  a  comparison  is  only  possible  for  the  Baltimore 
HSA  and  for  counties  within  it.  The  extent  to  which  the 
Baltimore  experience  is  typical  of  other  areas  in  the  United 
States  is  uncertain. 

We  believe  these  limitations  do  not  seriously  affect  the 
resulting  analyses.  Other  studies  have  indicated  that  in  most 
cases  telephone  surveys  produce  data  quite  similar  to  personal 
interviews.  In  regard  to  seasonal  factors,  some  information  on 
seasonal  variation  is  available  from  the  HIS.  A  later  report  will 
attempt  to  adjust  for  the  seasonal  differences. 

It  may  be  helpful  to  detail  some  particulars  of  the  telephone 
survey  relative  to  HIS.  The  phone  survey  attempted  to  simulate 
HIS  to  as  great  a  degree  as  possible  (using  the  same  questions, 
training  procedures  for  interviewers,  etc.).  The  major  differences 
were: 

1.  Only  one  respondent  was  used  per  family  within  a  household 
in  the  phone  survey,  this  respondent  providing  information 
on  all  other  family  members.  HIS  encourages  every  adult  in 
the  family  to  participate  in  a  group  session,  as  it  is  an 
"inperson"  interview  arrangement. 

2.  Some  questions  in  HIS  require  the  interviewer  to  show  cards 
to  respondents.  For  the  phone  survey,  cards  were  mailed  to 
some  respondents  after  initial  contact  as  an  experiment.  For 
the  60  to  65  percent  of  the  respondents  who  did  not  use 
cards,  some  HIS  questions  had  to  be  modified. 

3.  Not  all  HIS  questions  for  any  one  year  were  used  in  the 
phone  survey.  A  single  interview  by  phone  required  approxi- 
mately 30  minutes,  while  an  HIS  interview  requires 
approximately  1  hour. 

4.  Non-telephone  households  are  naturally  excluded  from  a 
phone  survey. 

5.  Clusters  of  households  using  the  random  digit  dialing  design 
of  the  phone  survey  differ  in  nature  from  the  clusters  of 
households  on  a  city-block  approach  used  in  HIS. 

6.  Interviewers  in  the  phone  survey  were  closely  monitored, 
and  there  existed  a  great  deal  more  communication  among 
interviewers  working  out  of  a  central  location  than  is 
possible  with  HIS. 
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The  response  rate  for  the  phone  survey  was  76  percent. 
There  were  2,470  households  in  this  survey  consisting  of  7,013 
people.  There  were  15  primary  interviewers  in  the  study,  each 
logging  at  least  100  interviewing  hours.  Information  on  approxi- 
mately 1,200  people  was  obtained  from  the  Baltimore  PSU  for 
HIS.  HIS  uses  a  single  interviewer  for  the  Baltimore  PSU. 

Comparison  with  Direct  HIS  Estimates 

In  the  larger  PSU's,  the  HIS  sample  size  is  sufficient  to 
provide  the  data  with  fair  reliability.  For  the  20  largest  SMSA's 
in  the  United  States,  we  have  compared  direct  HIS  estimates 
with  those  prepared  for  the  same  areas  using  synthetic  or 
regression  techniques.  For  the  regression,  this  is  equivalent  to 
examining  the  distances  the  observed  values  are  from  the 
regression  values. 

Average  Mean  Square  Error 

The  evaluation  method  described  in  the  above  section  suffers 
from  three  qualifications: 

•  It  can  only  be  applied  to  the  larger  PSU's.  The  situation  for 
smaller,  largely  rural,  PSU's  may  be  quite  different. 

•  In  making  comparisons  for  a  group  of  areas,  there  are  bound 
to  be  variations  among  the  areas  in  the  amount  of  difference 
between  the  direct  estimate  and  the  synthetic  or  regression 
estimate.  A  method  is  needed  of  summarizing  the  results  so 
that  a  conclusion  can  be  reached  on  whether  or  not  the 
estimates  are  satisfactory. 

•  The  difference  between  a  direct  HIS  estimate  and  a  synthetic 
or  regression  estimate  reflects  two  sources  of  error:  (a)  the 
inaccuracy  of  the  synthetic  or  regression  estimate;  and  (b) 
sampling  error  in  the  HIS  estimate.  It  is  desirable  to 
eliminate  the  effect  of  the  HIS  sampling  error  in  the  overall 
evaluation. 

The  average  mean  square  error  (AMSE)  overcomes  these 
three  limitations.  Using  synthetic  estimation  terminology,  the 
average  mean  square  error  is  defined  as 


M 


M 
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(3.1) 


where 


M 


is  the  synthetic  estimate  for  area  i; 
is  the  true  value  in  area  i;  and 
is  the  number  of  areas. 


The  AMSE  can  be  thought  of  as  having  characteristics  similar 
to  those  of  sampling  variances.  That  is,  the  chances  will  be 
about  2  out  of  3  that  the  synthetic  estimate  will  be  equal  to  the 
true  value  plus  or  minus  the  square  root  of  the  AMSE;  the 
chances  are  19  out  of  20  that  the  range  within  which  the 
synthetic  estimate  appears  will  be  plus  or  minus  twice  the 
square  root  of  the  AMSE,  etc. 


Of  course,  in  practical  situations  the  value  of  Uj  is  not 
known.  Gonzalez  and  Waksberg^  have  shown  that  the  AMSE  of 
a  rate  per  person  can  be  estimated  by 


MZ    2Pij(Uj-Uij)    2_2   SP^aj; 


(3.2) 


where 


j      is  an  index  for  the  sex-age-etc,  groups  used  in  the 
synthetic  estimates; 

P|j  is  the  population  proportion  in  the  i       PSU,  in  the 


i| 


j^"  sex-age-etc,  category; 


Uj    is  the  survey  estimate  of  the  rate  per  person  in  the 
j^"^  demographic  group; 

Ujj  is  the  survey  estimate  of  the  rate  per  person  in  the 
j^"^  demographic  group  in  the  i^^  PSU;  and 


Ojj  is  the  sampling  variance  for  the  item,  within  the  i,j 
category. 
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Calculations  of  the  AMSE  have  been  carried  out  for  the  items 
for  which  synthetic  estimates  are  prepared. 

A  similar  type  of  analysis  can  be  made  for  regression 
estimates."*  With  regression  estimates,  the  sum  of  squares  of  the 
residuals  from  the  line  of  regression  replaces  the  first  term  of 
equation  (3.2).  The  second  term  remains  the  same. 

AVAILABLE  RESULTS 

The  necessary  computations  have  been  completed  for  22  of 
the  40  items  in  the  program,  and  basic  information  on  the 
quality  of  the  synthetic  and  regression  estimates  for  these  items 
are  shown  in  the  attached  tables.  Similar  information  for  the 
other  18  items  will  become  available  at  a  later  time.  Even  for 
the  22  items,  the  discussion  and  explanation  for  the  statistics 
that  have  been  produced  should  be  considered  preliminary. 
Further  analysis  of  the  data  is  continuing,  and  the  additional 
work  that  is  planned,  described  in  the  next  section,  may  shed 
new  light  on  the  results. 

However,  even  with  the  limited  analyses  done  to  date,  some 
conclusions  appear  clear,  and  we  believe  it  is  unlikely  that  they 
will  be  revised  when  the  additional  information  becomes 
available.  The  main  conclusion  is  that  there  is  considerable 
variation  in  the  quality  of  the  estimates  among  the  health- 
related  items  studied,  and  for  many  of  the  items,  neither 
synthetic  nor  regression  estimates  produce  very  reliable  data  for 
areas  of  the  size  of  typical  HSA's.  At  least  this  is  true  with  the 
techniques  used  for  this  project.  The  errors  are  probably  even 
larger  for  areas  the  size  of  counties,  although  further  evidence  is 
needed  on  this.  The  extent  to  which  such  data  can  be  used  for 
policy  analysis  and  decisions  depends,  of  course,  on  the  degree 


'Gonzalez,  Maria  and  Joseph  Waksberg,  "Estimation  of  the  Error  of 
Synthetic  Estimates,"  prepared  for  presentation  at  the  first  meeting  of 
the  International  Association  of  Survey  Statisticians,  Vienna,  Austria, 
August  18-25,  1973. 

''Gonzalez,  Maria,  and  Hoza,  C,  "Small-Area  Estimation  with 
Applications  to  Unemployment  and  Housing  Estimates,"  Journal  of  the 
American  Statistical  Association,  73:  1978. 
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of  accuracy  needed  for  these  uses.  The  implications  of  high 
sampling  errors  with  respect  to  the  applicability  of  synthetic 
estimates  is  a  subject  which  needs  to  be  dealt  with  separately. 
Such  issues  are  beyond  the  scope  of  this  paper. 

Before  describing  the  data  leading  to  these  conclusions,  let  us 
give  the  specific  estimating  techniques  used,  in  somewhat  more 
detail  than  described  earlier.  The  synthetic  estimates  were 
prepared  by  calculating  national  rates  per  person,  separately  by 
race-sex-age,  and  applying  them  to  the  best  estimates  of  the 
population  by  race-sex-age  in  each  local  area.  The  race-sex-age 
classifications  consisted  of: 

Race:       White  vs.  Non-White; 

Sex:  Male  vs.  Female; 

Age:         Under  15,  15  to  44,  45  to  64,  65  and  over. 

The  population  estimates  were  the  most  current  estimates 
prepared  by  the  Census  Bureau.  At  the  time  this  work  was  done, 
the  Census  estimates  were  for  1977.  In  addition,  for  the 
Baltimore  SMSA,  other  population  estimates  prepared  by  the 
Baltimore  Regional  Planning  Council  were  also  obtained  and 
formed  the  basis  of  alternative  synthetic  estimates. 

For  regression  estimates,  nine  independent  variables  were 
used.  They  were: 

•  Synthetic   estimates   for  the   area   (using  census  population 
estimates); 

•  Mean   per  capita  income  in  1975  (also  census  estimates); 

•  Percent  of  blue-collar  workers; 

•  Percent  married  and  17  years  and  over; 

•  Percent  completed  high  school; 

•  Percent  65  years  old  and  over; 

•  Percent  non-White; 

•  Number  of  M.D.'s  per  100,000  persons;  and 

•  Number  of  hospital  beds  per  100,000  persons. 

As  is  common  in  multiple  regression,  in  general,  only  a  few 
independent  variables  made  an  important  contribution  to  the 
model,  and  those  are  the  only  ones  that  were  eventually  used  to 
create  estimates. 

Table  1  compares  synthetic  and  regression  estimates  of  each 
of  the  22  items  for  the  Baltimore  SMSA  with  both  the  results  of 
the  telephone  survey  and  the  direct  HIS  estimates  for  Baltimore. 
Synthetic  and  regression  estimates  are  fairly  close;  the  two,  of 
course,  are  not  independent  since  the  synthetic  estimate  variable 
was  usually  one  of  the  independent  variables  making  an 
important  contribution  to  the  regression.  For  many  items, 
synthetic  and  regression  estimates  are  quite  close  to  the  results 
of  the  telephone  survey.  However,  there  are  quite  wide 
differences  in  a  few  cases.  Differences  of  20  to  25  percent  are 
not  unusual,  and  there  is  a  difference  of  50  percent  for  one  item 
(visits  to  emergency  rooms  per  person  per  year).  These 
differences  are  generally  far  beyond  the  possible  effects  of 
sampling  error. 


However,  a  surprising  feature  of  table  1  is  that  there  are  even 
greater  differences  between  the  results  of  the  telephone  survey 
and  the  direct  HIS  estimates  for  Baltimore.  They  are  also 
beyond  any  reasonable  effects  of  sampling  errors.  There  seems 
to  be  no  obvious  explanation  of  these  differences.  Some  part  of 
the  differences  could  be  due  to  the  fact  that  the  direct  HIS 
covered  the  year  1977  while  the  telephone  survey  was  con- 
ducted during  the  last  few  months  of  1979  and  January  1 980.  It 
does  not  seem  likely  that  there  are  enough  changes  in  health 
characteristics  over  this  period  to  account  for  much  of  the 
differences.  There  is  definite  seasonal  variation  for  some  of  the 
items  studied,  and  this  probably  explains  more  of  the  dif- 
ferences, but  it  still  is  far  from  accounting  for  most  of  it.  We 
thought  it  possible  that  there  might  be  major  differences  in  the 
age-sex-race  composition  of  the  telephone  and  direct  HIS 
samples,  due  to  a  combination  of  sampling  variation  and 
differential  response  rates  and  that  this  could  be  a  partial 
explanation.  However,  as  can  be  seen  in  table  2,  such  differences 
did  not  occur. 

The  HIS  conducts  interviews  on  a  face-to-face  basis,  but  we 
doubt  that  the  differences  in  interviewing  techniques  contribute 
importantly  to  the  differences.  The  question  wording  in  the  two 
interviews  was  essentially  identical.  The  differences  are  quite 
puzzling,  but  as  we  will  indicate  later,  we  do  not  believe  they 
vitiate  the  use  of  the  telephone  survey  as  an  evaluation  tool  of 
regression  and  synthetic  estimates. 

Table  3  shows  data  similar  to  table  1,  but  for  each  county  in 
the  Baltimore  SMSA.  For  most  items,  the  synthetic  and 
regression  estimates  are  roughly  similar  to  the  results  of  the 
telephone  survey.  However,  they  do  not  seem  to  discriminate 
among  counties  well.  For  instance,  if  one  ranked  the  various 
counties  by  size  of  the  estimates,  for  most  items  rankings  of  the 
telephone  survey  would  not  conform  very  closely  to  synthetic 
or  regression  estimates.  There  are  a  few  items,  however,  for 
which  the  synthetic  and  regression  estimates  come  closer  to  the 
results  of  the  telephone  survey.  These  are  generally  items  with 
large  differences  between  the  Black  and  White  population.  For 
such  items,  Baltimore  city  data  are  quite  different  from  the  rest 
of  the  SMSA,  and  these  differences  persist  for  all  estimators. 

Table  4  shows  major  characteristics  of  the  regression  esti- 
mator. As  indicated  earlier,  although  the  regression  computa- 
tions started  with  nine  independent  variables,  a  much  smaller 
number  was  actually  used  for  most  items.  A  step-wise  regression 
program  was  initially  utilized,  with  all  nine  variables.  For  each 

item,  a  smaller  number  of  variables  accounting  for  virtually  the 

2 
entire  R    were  selected  and  used  to  prepare  the  estimates. 

The  variables  used  for  each  item  are  shown  in  the  second 
column  of  table  4.  We  were  surprised  by  the  variables  that  show 
up  as  important  for  most  items.  Synthetic  estimates  appear  as 
an  important  variable  for  only  about  half  the  items.  We  would 
have  expected  it  to  be  more  prominent.  Number  of  hospital 
beds  per  100,000  population  is  an  important  variable  for  some 
of  the  hospital-related  statistics,  but  not  all.  Demographic 
characteristics  such  as  percent  of  blue-collar  workers  and 
percent  with  a  high -school  education  appear  more  often  than  we 
would  have  expected. 
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The  next  to  last  column  shows  the  contribution  each  variable 
makes  to  the  total  regression  estimate.  Where  synthetic  esti- 
mates appear  as  a  variable,  it  is  usually  the  dominant  variable, 
frequently  (although  not  always)  accounting  for  70  or  80 
percent  of  the  part  of  the  estimates  added  to  the  intercepts. 
This  makes  it  even  more  puzzling  that  it  does  not  appear  for 
more  items.  It  is  possible  that  intercorrelations  among  variables 
complicate  the  choice  of  dominant  variables.  We  have  not  yet 

had  the  opportunity  to  examine  them. 

2 
The  last  column  of  table  4  shows  the  R    for  each  item,  and  it 

2 
can  be  seen  that  they  are  quite  low.  The  highest  R     is  0.30,  and 

there  are  a  few  that  are  about  0.20.  The  rest  are  lower.  The  low 

2 
values  of  R     explain  the  poor  ability  of  regression  estimates  to 

simulate  the  telephone  survey  in  Baltimore.  It  is  possible  that  a 
model  which  includes  interaction  terms  or  nonlinear  relation- 
ships may  work  better.  Such  models  were  not  examined  in  this 
study. 

Table  5  shows  the  reason  for  the  similar  poor  predictive 
ability  for  the  synthetic  estimates.  The  root  average  mean 
square  error  has  been  expressed  as  a  porportion  of  the  estimate. 
The  results  are  shown  in  the  last  column.  The  relative  root  mean 
square  error  can  be  thought  of  as  the  analogue  of  the  coefficient 
of  variation  of  a  sample  survey. 

The  relative  errors  are  generally  in  the  range  of  0.2  to  0.5.  A 
few  are  as  high  as  1 .0.  A  relative  error  of  0.5  implies  that  when 
synthetic  estimates  are  prepared  for  a  set  of  areas,  one  can 
expect  approximately  one-third  of  the  areas  to  have  an  error  of 
more  than  50  percent  of  a  census  value. 

Other  studies  have  shown  important  regional  differences  for 
some  types  of  health  characteristics.  It  is  possible  that  using 
regional  parameters,  rather  than  those  for  the  total  United 
States,  may  improve  synthetic  or  regression  estimates,  or  both. 
If  resources  permit,  they  will  be  examined  in  a  later  phase  of  the 
project. 

Table  6  contains  further  insight  on  the  poor  predictive  power 
of  the  synthetic  and  regression  estimates.  This  table  contains 
both  types  of  estimates,  as  well  as  the  direct  HIS  estimates  for 
the  largest  20  SMSA's.  Somewhat  more  than  20  areas  are  shown 
because  several  of  the  largest  SMSA's  have  been  split  up  into 
subareas.  We  have  selected  only  a  few  of  the  22  items  to  keep 
the  table  to  a  reasonable  size,  but  the  other  items  show  similar 
patterns. 

It  can  be  seen  that  the  range  of  variation  among  areas  is 
much  narrower  for  synthetic  and  regression  estimates  than  for 
direct  estimates.  Furthermore,  if  one  were  interested  in  ranking 
the  areas  by  size  for  an  item,  in  order  to  identify  the 
higher -valued  or  lower-valued  areas,  synthetic  and  regression 
would  generally  not  simulate  the  results  of  sample  surveys.  As 
was  the  case  in  Baltimore,  the  differences  cannot  be  attributed 
to  sampling  error.  These  results  are  consistent  with  the  findings 
of  other  studies.^ 


'Schaible,  Wesley;  Brock,  Dwight;  and  Schnack,  George  A.,  National 
Center  for  Health  Statistics;  "An  Empirical  Comparison  of  the  Simple 
Inflation,  Synthetic,  and  ComFXDSite  Estimators  for  Small-Area 
Statistics,"  American  Statistical  Association  Proceedings  of  the  Social 
Statistics  Section,  1977,  Part  II,  pp.  1017-1021 . 


Table  6  contains  some  other  information  which  appears  to  be 
even  more  surprising  than  the  poor  performance  of  synthetic 
and  regression  estimates.  Synthetic  and  regression  estimates  for 
the  20  areas  appear  to  have  a  very  small  range  of  variation  due 
to  similarities  in  the  independent  variables  among  the  large 
metropolitan  areas.  However,  the  direct  estimates  seem  to 
encompass  a  much  wider  range  than  one  would  expect.  For 
example,  if  one  looks  at  the  number  of  visits  to  a  doctor's  office 
per  person  per  year,  the  direct  HIS  estimates  go  from  a  low  of 
1.80  for  a  PSU  of  New  York  to  a  high  of  5.39  for  a  PSU  of 
Philadelphia.  The  differences  cannot  be  explained  by  different 
demographic  compositions  of  the  areas  or  differences  in  the 
characteristics  used  for  the  regressions.  If  they  were,  then 
synthetic  and  regression  estimates  would  have  better  explana- 
tory power.  They  are  also  far  beyond  the  limits  of  sampling. 
These  are  the  largest  self-representing  PSU's  in  HIS  and  have 
fairly  large  sample  sizes. 

The  data  seem  to  imply  that,  for  the  items  studied,  areas  are 
inherently  very  different.  This  would  explain  why  predictors 
based  on  demographic  or  economic  information,  such  as  the 
synthetic  and  regression  estimates  utilized  in  this  study,  do  not 
have  much  power.  Ho.wever,  the  large  differences  among  areas 
appear  surprising.  The  items  selected  are  of  a  kind  that  one 
would  think  are  mostly  quite  stable.  Some  of  the  differences 
among  the  areas  are  no  doubt  due  to  inherent  geographic 
variation.  However,  the  dramatic  nature  of  these  differences 
suggest  that  there  may  be  problems  in  HIS  ability  to  enforce 
uniform  standards  of  interviewing.  In  most  of  the  areas,  the  HIS 
interviews  were  carried  out  by  only  a  few  interviewers  and 
between-interviewer  variability  may  be  quite  high.  This  con- 
jecture would  help  explain  the  large  differences  between  the 
direct  HIS  and  the  telephone  survey  in  the  Baltimore  SMSA. 
HIS, a  vehicle  designed  primarily  for  obtaining  national  estimates 
on  health-related  data,  apparently  does  not  provide  direct 
estimates  which  are  stable  enough  for  small-area  estimation 
needs. 

If  problems  in  the  HIS  are  the  major  reasons  for  the 
differences,  then  one  can  take  a  somewhat  different  attitude 
towards  synthetic  and  regression  estimates.  Measured  against  a 
standard  of  the  accuracy  of  data  actually  achievable  in  a  survey 
such  as  the  HIS,  synthetic,  and  regression  estimates  may  be  of 
acceptable  quality  for  most  practical  uses.  Further  analysis  in 
this  direction  is  necessary. 


FUTURE  ANALYSIS 

The  analysis  discussed  above,  done  on  the  1977  HIS  data, 
will  also  be  done  for  the  years  1976  and  1978.  This  will  allow  us 
to  observe  the  sensitivity  of  the  parameters  to  sample  size.  We 
also  plan  to  repeat  the  same  analyses  for  all  three  years 
combined.  In  addition,  there  are  several  sets  of  items  which  are 
only  available  for  a  particular  one  of  the  years  1976-1978.  For 
example,  health  insurance  information  is  available  for  the  1976 
HIS.  Some  of  these  items  for  each  of  the  3  years  (approximately 
20  items  altogether)  will  be  examined. 
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Pending  available  funding,  we  also  hope  to  expand  our 
approach  to  the  analysis  in  two  other  directions.  First,  we 
expect  to  examine  whether  the  introduction  of  an  additional 
cross-classification  of  the  sex-race-age  groups  produces  a  signifi- 
cant improvement  in  the  synthetic  and  regression  estimates. 
Two  categorical  variables  will  be  considered:  degree  of  urbani- 
zation (SMSA's  over  1 ,000,000  population,  smaller  SMSA's,  and 
non-SMSA's)  and  census  region  (Northeast,  North  Central, 
South,  or  West).  If  it  seems  useful,  an  urbanization-region 
cross-classification  may  be  introduced.  Again,  the  data  would  be 
evaluated  by  calculation  of  the  appropriate  mean  square  errors 
and  comparison  with  results  of  the  telephone  survey  in 
Baltimore,  Maryland. 

Second,  we  intend  to  calculate  synthetic  and  regression 
estimates  and  average  mean  square  errors  for  specific  population 
subgroups  within  PSU's.  These  would  be: 

Sex:         Female 
Age:         17  to  44 

65  and  over 
Race:       Non-White 


Statistics  for  each  dependent  variable  would  be  calculated  for 
each  of  the  subgroups. 

Other  routes  of  investigation  we  may  pursue  include: 

1.  The  use  of  HIS  income  information  in  the  regression  models; 

2.  The  effect  of  PSU  estimates  in  the  regression  models,  to  be 
examined  by  excluding  PSU's  from  regressions  and  com- 
paring the  results  to  those  from  a  complete  data  base; 

3.  The  validity  of  the  assumption  of  linearity  in  the  regression 
models; 

4.  The  preparation  of  estimates  based  on  varying  sizes  of  PSU's; 

5.  The  utilization  of  past  years'  estimates  as  predictors  of  the 
current  year's  estimates  and  the  examination  of  other 
methods  for  assessing  auto-correlative  effects;  and, 

6.  The  construction  of  composite  estimators  consisting  of  a 
weighted  average  of  a  synthetic  (or  regression)  estimator  and 
a  direct  estimator. 
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Table  1.  Comparison  of  Alternative  Estimates  for  the  Baltimore  SMSA 


Item  and  area 


Telephone  survey 


Estimate 


Approxi- 
mate 0 


Direct  HIS  estimate 


Estimate 


SMSA 


U.S. 


SMSA 


U.S. 


Synthetic  estimate 


Based  on 

Maryland 

population 

figures 


Based  on 

census 

population 

figures 


Regres- 
sion 
estimate 


Restricted-activity  days  per  person 
per  year 

Bed-disability  days  per  person 
per  year 

Work-loss  days  per  person  per  year  . 

School-loss  days  per  person  per  year 

Proportion  limited  in  activity 

Proportion  unable  to  carry  on 
major  activity 

Proportion  with  limitation  of 
activity  for  a  duration  of  1  year 
or  longer 

Number  of  doctor  visits  per  person 
per  year  (annual  recall) 

Proportion  of  people  with  one  or 
more  doctor  visits  in  the  last  year  . 

Number  of  dental  visits  per  person 
per  year  

Number  of  short-stay  hospital 
episodes  per  100  persons  per  year . 

Number  of  short-stay  hospitals  days 
per  100  persons  per  year 

Average  length  of  stay  in  hospital  .  . 

Proportion  of  persons  with  one  or 
more  hospital  episodes  in  the  last 
year 

Visits  to  doctor's  office  per  person 
per  year 

Visits  to  emergency  room  per  person 

per  year 

Visits  to  out-patient  clinic  per 

person  per  year 

Visits  to  general  practitioners  per 

person  per  year 

Visits  to  selected  practitioners  per 

person  per  year 

Visits  for  diagnosis  or  treatment 

per  person  per  year 

Visits  for  chronic  condition  per 

person  per  year 

*1974  Estimates. 


18.15 

7.58 
4.11 
1.39 
.162 

.040 

.134 
3.30 

.746 
1.88 

12.58 

105.37 
8.38 

.108 
2.878 

.164 
.616 
1.942 
2.81 
3.843 
1.713 


.61 

.56 
.40 
.19 
.005 

.003 

.005 
.15 
.006 
.137 

.60 

7.94 
.74 

.004 
.150 

.034 
.088 
.125 
.172 
.201 
.152 


10.34 

4.31 
2.97 
.65 
.119 

.033 

.106 
(NA) 
.766 
1.38 

8.58 

83.14 
9.69 

.074 
2.377 

.314 
.835 
1.640 
2.765 
3.200 
1.885 


17.78 

6.87 
2.12 
1.05 
.135 

*.033 

(NA) 
(NA) 
**.752 
1.6 

14.0 

109.20 
7.8 

.104 
*3.48 

(NA) 

(NA) 

*2.545 

*3.90 

*4.289 

2.556 


1.44 

.90 
.82 
.24 
.013 

.008 

(NA) 
(NA) 
.018 
.23 

5.97 

18.97 
7.03 

.007 
.301 

(NA) 
(NA) 
.309 
.348 
.381 
.355 


.25 

.14 
.06 
.04 
.001 

.001 

(NA) 
(NA) 
.002 
.027 

1.05 

2.51 
.61 

.001 
.045 

(NA) 
(NA) 
.049 
.051 
.052 
.049 


17.97 

6.99 
3.10 
.96 
.134 

.036 

.115 
3.65 

.739 
1.54 

13.10 

103.20 
(NA) 

.105 
3.289 

.245 

.484 

2.480 

(NA) 

4.110 

2.131 


18.11 

7.06 
3.05 
1.01 
.135 

.037 

.116 
3.67 

.740 
1.53 

13.09 

103.92 
(NA) 

.104 
3.290 

.245 
.489 
2.485 
3.709 
4.125 
2.143 


20.29 

7.65 
3.22 
.88 
.138 

.036 

.118 
3.72 

.736 
1.64 

12.09 

118.85 
5.85 

.093 
3.570 

.241 
.526 
2.166 
3.818 
4.620 
2.208 
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Table  2.  Sex-Race-Age  Distribution  for  U.S.  and  Baltimore  SMSA  Based  on  Telephone  Survey  and  HIS  Estimates 


Sex, race,  and  age 


U.S. 

estimate 

from  HIS 


Baltimore 

estimate 

from  HIS 


Telephone  survey 


Baltimore  estimate 
before  adjustment 


Baltimore  estimate 
after  adjustment 


MALE 

White 

Less  than  15  years 
15  to  44  years.  .  . 
45  to  64  years.  .  . 
65  years  and  over 

Non-White 

Less  than  15  years 
15  to  44  years.  .  . 
45  to  64  years.  .  . 
65  years  and  over 

FEMALE 

White 

Less  than  15  years 
15  to  44  years.  .  . 
45  to  64  years.  .  . 
65  years  and  over 

Non-White 

Less  than  15  years 
1 5  to  44  years.  .  . 
45  to  64  years.  .  . 
65  years  and  over 


10.32 

9.34 

19.03 

17.99 

8.77 

8.27 

3.91 

3.00 

2.08 

2.92 

2.74 

4.10 

.98 

2.23 

.43 

.92 

9.85 

9.98 

19.69 

16.88 

9.52 

9.47 

5.59 

3.85 

2.05 

3.00 

3.30 

5.14 

1.16 

2.09 

.57 

.84 

8.03 

16.74 

8.44 

2.68 


3.11 

5.30 

1.96 

.45 


8.15 

18.98 

8.69 

3.42 


3.80 

6.85 

2.57 

.85 


8.02 

17.93 

7.72 

3.12 


3.24 

5.92 

1.92 

.66 


7.47 

17.73 

8.31 

4.65 


3.34 

6.65 

2.35 

.97 


Table  3.  Comparison  of  Alternative  Estimates  for  Baltimore  SMSA  and  Component  Counties 


Item  and  area 


Telephone  survey 


Estimate 


Approxi- 
mate a 


Synthetic  estimate 


Based  on 

Maryland 

population 

figures 


Based  on 

Census 

population 

figures 


Restricted-activity  days  per  person  per  year 

Total  SMSA 

Anne  Arundel 

Baltimore  City 

Baltimore  County 

Carroll 

Harford 

Howard 

Bed-disability  days  per  person  per  year 

Total  SMSA 

Anne  Arundel 

Baltimore  City 

Baltimore  County 

Carroll 

Harford 

Howard 

Work-loss  days  per  person  per  year 

Total  SMSA 

Anne  Arundel 

Baltimore  City 

Baltimore  County 

Carroll 

Harford 

Howard 

School-loss  days  per  person  per  year 

Total  SMSA 

Anne  Arundel 

Baltimore  City 

Baltimore  County 

Carroll 

Harford 

Howard 

Proportion  limited  in  activity 

Total  SMSA 

Anne  Arundel 

Baltimore  City 

Baltimore  County 

Carroll 

Harford 

Howard 


18.15 
20.40 
19.78 
17.57 
15.20 
11.60 
14.23 


7.58 
6.99 
9.03 
7.87 
4.59 
4.85 
3.93 


4.11 
5.56 
4.94 
3.32 
2.68 
1.71 
2.76 


1.39 
1.58 
1.15 
1.60 
1.26 
1.55 
1.31 


.162 
.137 
.190 
.150 
.141 
.162 
.126 


0.608 
1.877 
1.001 
1.132 
1.947 
1.094 
1.821 


0.561 
1.290 
1.022 
1.145 
1.350 
1.033 
0.902 


0.404 
1.282 
0.735 
0.670 
0.985 
0.641 
0.834 


.187 
.673 
.243 
.376 
.472 
.418 
.400 


.005 
.014 
.009 
.010 
.015 
.015 
.015 


17.97 
16.82 
19.40 
17.67 
17.08 
16.42 
16.23 


6.99 
6.44 
7.75 
6.74 
6.52 
6.26 
6.26 


3.10 
3.05 
3.28 
2.98 
2.85 
2.98 
3.01 


.96 

.99 

.93 

.95 

1.02 

1.04 

1.08 


.134 
.122 
.146 
.135 
.127 
.117 
.112 


18.11 
16.86 
19.74 
17.55 
17.43 
16.37 
15.87 


7.06 
6.51 
7.90 
6.66 
6.65 
6.37 
6.11 


3.05 
2.89 
3.31 
2.94 
2.87 
2.70 
2.90 


1.01 
1.13 
.92 
.98 
1.01 
1.23 
1.18 


.135 
.120 
.151 
.134 
.132 
.113 
.107 
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Table  3.  Comparison  of  Alternative  Estimates  for  Baltimore  SMSA  and  Component  Counties— Continued 


Telephone  survey 

Synthetic  estimate 

Item  and  area 

Estimate 

Approxi- 
mate a 

Based  on 

Maryland 

population 

figures 

Based  on 

Census 

population 

figures 

Regression 
estimate 

Proportion  unable  to  carry  on  major  activity 

Total  SMSA 

.040 
.038 

.003 
.008 

.036 
.030 

.037 
.030 

.036 

Anne  Arundel 

.028 

Baltimore  City 

.049 

.005 

.044 

.046 

.052 

Baltimore  County 

.039 

.006 

.035 

.034 

.029 

Carroll 

.022 
.025 

.006 
.006 

.032 
.028 

.035 
.027 

.024 

Harford 

.028 

Howard 

.021 

.006 

.026 

.025 

.024 

Proportion  with  limitation  of  activity  for  duration 

1  year  or  longer 

Total  SMSA 

.134 
.108 

.005 
.012 

.115 
.104 

.116 
.103 

.118 

Anne  Arundel 

.104 

Baltimore  City 

.162 
.122 

.009 
.009 

.126 
.116 

.130 
.115 

.139 

Baltimore  County 

.110 

Carroll 

.120 

.014 

.109 

.114 

.098 

.138 

.014 

.100 

.097 

.103 

Howard 

.092 

.013 

.095 

.091 

.096 

Number  of  doctor  visits  per  person  per  year 

Total  SMSA     

3.30 
3.48 

.03 
.33 

3.65 
3.55 

3.67 
3.59 

3.72 

Anne  Arundel 

3.50 

Baltimore  City 

3.40 
3.05 

.19 
.23 

3.74 
3.67 

3.75 
3.66 

3.94 

Baltimore  County 

3.75 

Carroll 

3.13 
3.03 

.34 
.34 

3.62 
3.53 

3.63 
3.56 

3.22 

Harford 

3.39 

Howard 

3.88 

.36 

3.51 

3.47 

3.64 

Proportion  of  persons  with  one  or  more  doctor  visits 

in  the  last  year 

Total  SMSA 

.746 
.750 
.724 
.762 

.006 
.017 
.010 
.012 

.739 
.738 
.736 
.744 

.740 
.743 
.735 
.744 

.736 

Anne  Arundel 

.740 

Baltimore  City 

.741 

Baltimore  County 

.731 

Carroll 

.765 

.018 

.744 

.743 

.722 

Harford 

.744 

.018 

.740 

.741 

.730 

Howard 

.784 

.018 

.739 

.739 

.746 
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Table  3.  Comparison  of  Alternative  Estimates  for  Baltimore  SMSA  and  Component  Counties-Continued 


Telephone  survey 

Synthetic  estimate 

Based  on 

Based  on 

Regression 

Item  and  area 

Estimate 

Approxi- 
mate o 

Maryland 

population 

figures 

Census 

population 

figures 

estimate 

Number  of  dental  visits  per  person  per  year 

Total  SMSA 

1.88 
2.20 

.01 
.13 

1.54 
1.63 

1.53 
1.63 

1.64 

Anne  Arundel 

1.77 

Baltimore  City 

1.57 

.08 

1.33 

1.30 

1.39 

Baltimore  County 

2.18 

.09 

1.66 

1.69 

1.86 

Carroll 

1.69 

.14 

1.70 

1.67 

1.51 

Harford 

1.73 

.13 

1.67 

1.64 

1.65 

Howard 

1.63 

.14 

1.63 

1.65 

1.96 

Average  length  of  stay  in  a  hospital 

Total  SMSA 

8.38 
7.34 

.74 
1.90 

5  85 

Anne  Arundel 

4.71 

Baltimore  City 

9.51 
8.01 

1.12 
1.54 

NOT 
COMPUTED 

NOT 
COMPUTED 

7  47 

Baltimore  County 

5.08 

Carroll 

7.45 

2.30 

4.22 

Harford 

6.85 

2.01 

4.42 

Howard 

8.21 

2.11 

5  24 

Proportion  of  persons  with  one  or  more  hospital 

episodes  in  the  last  year 

Total  SMSA 

.108 
.113 
.115 
.097 

.004 
.013 
.007 
.008 

.105 
.101 
.107 
.106 

.104 
.101 
.108 
.105 

093 

Anne  Arundel 

090 

Baltimore  City 

094 

Baltimore  County 

.091 

Carroll 

.096 

.012 

.103 

.104 

.094 

Harford 

.111 

.013 

.100 

.099 

.109 

Howard 

.118 

.014 

.099 

.096 

.074 

per  year 

Total  SMSA 

12.58 
13.07 

0.74 
2.07 

13.10 
12.61 

13.09 
12.56 

12  09 

Anne  Arundel 

11.93 

Baltimore  City 

13.16 
11.54 

1.20 
1.48 

13.46 
13.25 

13.54 
13.14 

12  33 

Baltimore  County 

11.95 

Carroll 

11.62 

2.19 

12.93 

13.02 

11.99 

Harford 

12.50 

2.14 

12.43 

12.32 

14.01 

Howard 

13.73 

2.30 

12.32 

11.92 

9.51 
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Table  3.  Comparison  of  Alternative  Estimates  for  Baltimore  SMSA  and  Component  Counties— Continued 

Telephone  survey 

Synthetic  estimate 

Item  and  area 

Estimate 

Approxi- 
mate 0 

Based  on 

Maryland 

population 

Figures 

Based  on 

Census 

population 

figures 

Regression 
estimate 

Number  of  short-stay  hospital  days  per  100  persons 

per  year 

Total  SMSA 

105.37 
95.87 

7.26 
20.18 

103.20 
94.36 

103.92 
93.44 

118  85 

Anne  Arundel 

101.62 

Baltimore  City 

125.19 
92.42 

11.71 
14.43 

113.95 
101.33 

117.16 
99.54 

135.19 

Baltimore  County 

120.72 

Carroll 

86.64 

21.38 

94.42 

99.25 

100.49 

Harford 

85.60 

20.80 

90.67 

89.34 

95.95 

Howard 

112.80 

22.37 

89.59 

85.52 

94.36 

Visits  to  doctor's  office  per  person  per  year 

Total  SMSA 

2.878 
3.705 

.150 
.463 

3.289 
3.286 

3.296 
3.327 

3.570 

Anne  Arundel 

3.652 

Baltimore  City 

2.532 
2.839 

.240 
.282 

3.152 
3.442 

3.139 
3.462 

3.292 

Baltimore  County 

3.859 

Carroll 

2.623 

.428 

3.423 

3.427 

3.403 

Harford 

2.636 

.376 

3.304 

3.314 

3.497 

Howard 

3.411 

.479 

3.242 

3.229 

3.865 

Visits  to  emergency  room  per  person  per  year 

Total  SMSA 

.164 
.081 

.035 
.065 

.245 
.239 

.245 
.238 

.241 

Anne  Arundel 

.161 

Baltimore  City 

.182 
.216 

.062 
.076 

.267 
.226 

.269 
.222 

.315 

Baltimore  County 

.229 

Carroll 

.101 

.074 

.226 

.226 

.245 

Harford 

.138 

.078 

.236 

.239 

.190 

Howard 

.073 

.066 

.245 

.245 

.192 

Visits  to  out-patient  clinic  per  person  per  year 

Total  SMSA 

.616 
.679 

.088 
.306 

.484 
.420 

.489 
.420 

.526 

.500 

Baltimore  City 

.772 

.164 

.611 

.629 

.846 

Baltimore  County 

.367 
.130 

.102 
.086 

.410 
.382 

.393 
.398 

.363 

Carroll 

.279 

Harford 

.663 

.181 

.397 

.403 

.492 

Howard 

1.095 

.381 

.420 

.398 

.691 

Visits  to  general  practitioner  per  person  per  year 

Total  SMSA 

1.941 

.125 

2.479 

2.485 

2.166 

Anne  Arundel 

2.111 

.381 

2.410 

2.408 

2.245 

Baltimore  City 

2.106 
1.622 

.203 
.226 

2.540 
2.492 

2.557 
2.483 

2.410 

Baltimore  County 

2.603 

Carroll 

2.116 

.406 

2.450 

2.467 

2.780 

Harford 

1.862 

.345 

2.587 

2.371 

2.376 

Howard 

2.036 

.434 

2.365 

2.329 

2.057 
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Table  3.  Comparison  of  Alternative  Estimates  for  Baltimore  SMSA  and  Component  Counties— Continued 


Telephone  survey 

Synthetic  estimate 

Item  and  area 

Estimate 

Approxi- 
mate 0 

Based  on 

Maryland 

population 

figures 

Based  on 

Census 

population 

figures 

Regression 
estimate 

Visits  to  selected  practitioners  per  person  per  year 

Total  SMSA 

3.245 
3.968 

.172 
.586 

3.709 
3.728 

3.818 

Anne  Arundel 

3.647 

Baltimore  City 

2.98 
3.138 

.247 
.314 

NOT 

3.631 
3.786 

3  753 

Baltimore  County 

4.067 

Carroll 

3.107 

.499 

COMPUTED 

3.761 

3.764 

Harford 

2.985 

.442 

3.746 

3.700 

Howard 

3.835 

.715 

3.640 

3.819 

Visits  for  diagnosis  or  treatment  per  person  per  year 

Total  SMSA 

3.843 
4.716 

.208 
.700 

4.110 
4.053 

4.125 
4.099 

4  620 

Anne  Arundel 

4.410 

Baltimore  City 

3.585 
3.695 

.321 
.374 

4.078 
4.204 

4.081 
4.215 

4  823 

Baltimore  County 

4.559 

Carroll 

3.229 

.519 

4.162 

4.182 

4.128 

Harford 

3.299 

.450 

4.052 

4.067 

4.373 

Howard 

4,828 

.776 

4.002 

3.986 

4.880 

Visits  per  chronic  condition  per  person  per  year 

Total  SMSA 

1.713 
2.101 

.152 
.537 

2.131 
2.024 

2.143 
2.021 

2  208 

2.062 

Baltimore  City 

1.710 
1.625 

.238 
.265 

2.200 
2.180 

2.240 
2.169 

2  591 

Baltimore  County 

2.242 

Carroll 

1.527 

.361 

2.091 

2.132 

1.965 

1.431 

.341 

1.984 

1.954 

2.076 

Howard 

1.557 

.417 

1.937 

1.879 

2.296 
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Table  4.  Characteristics  of  Regression  Estimates 


Estimate  of 

Contribution 

independent 

to  total 

Item 

Independent  Variables 

Coefficients 

variables  for 

Baltimore 

SMSA 

estimate  for 

Baltimore 

SMSA 

r2 

Proportion  of  people  with  one  or 

Intercept 

.1260 

.143 

more  hospital  episodes  in  last  year  .  . 

Percent  high  school  educated 
Number  of  hospital  beds  per 

-.0420 

.5911 

.35 

100,000  population 

.000046 

452.708 

.29 

Number  of  M.D.'s  per  100,000 

population 

-.0001 

258.394 

.36 

Number  of  short-stay  hospital 

Intercept 

18.8794 

.130 

episodes  per  100  persons  per  year .  . 

Percent  high  school  educated 
Number  of  hospital  beds  per 

-9.5157 

.5911 

.46 

100,000  population 

.0061 

452.708 

.23 

Number  of  M.D.'s  per  100,000 

population 

-.0152 

258.394 

.31 

Number  of  hospital  days  per  100 

Intercept 

--278.78 

.109 

persons  per  year 

Synthetic  estimate 

5.3676 

103.20 

.77 

Percent  high  school  educated 

-78.281 

.5911 

.07 

Percent  over  65  years 

-967.32 

.0939 

.13 

Percent  non-White 

-76.631 

.2506 

.03 

Average  length  of  stay  in  hospital  .  .  . 

Intercept 

3.5210 

.087 

Percent  blue  collar 

-9.7882 

.0947 

.22 

Number  of  M.D.'s  per  100,000 

population 

.0047 

258.394 

.29 

Percent  over  65  years 

21.7070 

.0939 

.49 

Restricted-activity  days  per  person 

Intercept 

-13.840 

.048 

per  year  

Synthetic  estimate 

2.578 

17.97 

.79 

Percent  blue  collar 

24.588 

.0947 

.05 

Percent  high  school  educated 

-8.382 

.5911 

.08 

Percent  over  65  years 

-52.290 

.0939 

.08 

Bed-disability  days  per  person 

Intercept 

-9.4183 

.098 

per  year  

Synthetic  estimate 

3.1067 

6.99 

.83 

Percent  blue  collar 

-12.1567 

.0947 

.04 

Percent  high  school  educated 

-3.6770 

.5911 

.08 

Percent  over  65  years 

-14.0559 

.0939 

.05 

Work-loss  days  per  person  per  year  .  . 

Intercept 

-2.8525 

.022 

Synthetic  estimate 

1 .3834 

3.10 

.70 

Percent  blue  collar 

7.5429 

.0947 

.12 

Per  capita  income 

.0002 

5339.26 

.18 

School-loss  days  per  person  per  year  . 

Intercept 

.5357 

.002 

Synthetic  estimate 

.3718' 

.96 

.65 

Number  of  hospital  beds  per 

100,000  population 

.0002 

452.708 

.16 

Number  of  M.D.'s  per  100,000 

population 

-.0004 

258.394 

.19 

Table  4.  Characteristics  of  Regression  Estimates— Continued 
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Estimate  of 

Contribution 

independent 

to  total 

Item 

Independent  Variables 

Coefficients 

variables  for 

Baltimore 

SMSA 

estimate  for 

Baltimore 

SMSA 

r2 

Proportion  of  people  limited  in 

Intercept 

.1190 

.195 

activity 

Synthetic  estimate 

.9705 

.134 

.54 

Percent  blue  collar 

-.2354 

.0947 

.09 

Percent  high  school  educated 

-.0636 

.5911 

.16 

Per  capita  income 

-.00000965 

5339.26 

.21 

Proportion  of  people  unable  to 

Intercept 

.0659 

.295 

carry  on  major  activity 

Synthetic  estimate 

1 .0897 

.036 

.37 

Percent  blue  collar 

-.1254 

.0947 

.11 

Percent  high  school  educated 

-  .0404 

.5911 

.22 

Per  capita  income 

-.00000616 

5339.26 

.30 

Proportion  of  people  with  limitation 

Intercept 

.1002 

.201 

of  activity  1  year  or  longer 

Synthetic  estimate 

1 .0362 

.115 

.44 

Percent  blue  collar 

-.2097 

.0947 

.36 

Percent  high  school  educated 

-.0534 

.5911 

.02 

Per  capita  income 

-.00000934 

5339.26 

.18 

Number  of  doctor  visits  per  person 

Intercept 

-3.1413 

.043 

per  year 

Synthetic  estimate 

1.7096 

3.65 

.87 

Percent  blue  collar 

-2.4144 

.0947 

.03 

Percent  non-White 

.5780 

.2506 

.02 

Per  capita  income 

.0001 

5339.26 

.08 

Proportion  of  people  with  at  least 

Intercept 

.5790 

.187 

one  doctor  visit 

Percent  high  school  educated 

.1469 

.5911 

.55 

Percent  non-White 

0678 

.2506 

.11 

Per  capita  income 

.00001 

5339.26 

.34 

Number  of  dental  visits  per  person 

Intercept 

-.4352 

.163 

per  year 

Percent  high  school  educated 

1.0760 

.5911 

.31 

Per  capita  income 

.00027 

5339.26 

.69 

Number  of  doctor  visits  in  doctor's 

Intercept 

1.1522 

office  per  person  per  year 

Percent  married  over  17  years 

2.4374 

.4477 

.35 

Percent  high  school  educated 

-.5691 

.5911 

.11 

.037 

Percent  non-White 

.5514 

.2506 

.04 

Per  capita  income 

-.000285 

5339.26 

.49 

Number  of  doctor  visits  in  emer- 

Intercept 

.6941 

gency  room  per  person  per  year  .  .  . 

Percent  married  over  17  years 

-1.3580 

.4477 

.72 

Percent  high  school  educated 

.1967 

.5911 

.14 

.044 

Percent  blue  collar 

.8574 

.0947 

.10 

Number  of  hospital  beds  per 

100,000  population 

-.0000939 

452.708 

.05 

76 


Table  4.  Characteristics  of  Regression  Estimates— Continued 


Estimate  of 

Contribution 

independent 

to  total 

Item 

Independent  Variables 

Coefficients 

variables  for 

Baltimore 

SMSA 

estimate  for 

Baltimore 

SMSA 

r2 

Number  of  doctor  visits  in  out 

Intercept 

.3584 

clinic  per  person  per  year 

Synthetic  estimate 

.8004 

.398 

.18 

Percent  blue  collar 

-1.6811 

.0947 

.20 

.121 

Percent  over  65  years 

-2.0935 

.0939 

.26 

Number  of  M.D.'s  per  100,000 

population 

.000794 

358.394 

.36 

Number  of  visits  to  general  prac- 

Intercept 

-.9497 

titioners  per  person  per  year 

Synthetic  estimate 

1.4914 

2.32 

.75 

Percent  high  school  educated 

-1.0206 

.5911 

.13 

.059 

Percent  blue  collar 

4.1333 

.0947 

.09 

Percent  non-White 

-.5295 

.2506 

.03 

Number  of  visits  to  selected  prac- 

Intercept 

2.1513 

titioners  per  person  per  year 

Percent  high  school  educated 

-.7778 

.5911 

.23 

Percent  blue  collar 

3.5755 

.0947 

.11 

.015 

Number  of  hospital  beds  per 

100,000  population 

.000256 

452.708 

.07 

Per  capita  income 

.000312 

5339.26 

.59 

Number  of  visits  for  diagnosis  or 

Intercept 

3.0038 

treatment  per  person  per  year  .... 

Percent  over  65  years 

-2.1623 

.0939 

.11 

Percent  non-White 

.6569 

.2506 

.09 

.046 

Number  of  M.D.'s  per  100,000 

population 

.00113 

258.395 

.16 

Per  capita  income 

.000225 

5339.26 

.64 

Number  of  visits  for  chronic 

Intercept 

.3146 

condition  per  person  per  year 

Synthetic  estimate 

.4275 

1.879 

.42 

Percent  non-White 

.6127 

.2506 

.08 

.039 

Number  of  M.D.'  per  100,000 

population 

.000716 

258.395 

.10 

Per  capita  income 

.000146 

5339.26 

.40 
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Table  5.  Average  Mean  Square  Error  of  Synthetic  Estimates 


Item 


U.S.  Estimate 


Average  mean 
square  error 


Root  mean 
square  error 


Relative  root 

mean  square 

error 


Restricted-activity  days  per  person  per  year 

Bed-disability  days  per  person  per  year 

Work-loss  days  per  person  per  year 

School-loss  days  per  person  per  year 

Proportion  limited  in  activity 

Proportion  unable  to  carry  on  major  activity 

Proportion  with  limitation  of  activity  for  duration  of 
1  year  or  longer 

Number  of  doctor  visits  per  person  per  year  (annual  recall) 

Proportion  of  people  with  one  or  more  doctor's  visits 
in  last  year 

Number  of  dental  visits  per  person  per  year 

Number  of  short-stay  hospital  episodes  per  100  persons 
per  year 

Number  of  short-stay  hospital  days  per  100  persons 
per  year 

Proportion  of  persons  with  one  or  more  hospital  episodes 
in  the  last  year 

Visits  in  doctor's  office  per  person  per  year 

Visits  in  emergency  room  per  person  per  year 

Visits  in  out-patient  clinic  per  person  per  year 

Visits  to  general  practitioner  per  person  per  year 

Visits  to  selected  practitioner  per  person  per  year 

Visits  for  diagnosis  or  treatment  per  person  per  year  .... 
Visits  for  chronic  condition  per  person  per  year 


17.8 
6.9 
2.12 
1.05 

.135 
•.033 


(NA) 
(NA) 

**.752 
1.6 

14.0 


109.2 

.104 
*3.466 
(NA) 
(NA) 

*2.570 
*3.575 
*4.3301 
*2.581 


68.86 

11.55 
4.51 
.9853 
.0018 
.0004 


.0015 
1.04 

.0043 
.7217 

33.41 


3045.11 

.0011 

1.3710 

.1413 

.1999 

1.3466 

1.5054 

2.0018 

.9090 


8.30 
3.40 
2.12 

.993 
.042 

.020 


.039 
1.02 

.066 
.850 

5.78 


55.18 

.033 
1.171 
.376 
.47 

1.160 

1.227 

1.415 

.953 


.466 
.493 
1.00 
.946 
.311 
.606 


(NA) 
(NA) 

.088 
.531 

.413 


.505 

.317 
.338 

(NA) 
(NA) 

.451 
.343 
.327 
.369 


*     1974  Estimate. 
**  1975  Estimate. 

NA  Not  available. 
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Table  6.  Comparison  of  Alternative  Estimates  for  20  Largest  SMSA's  for  Selected  Items 


Area  and  PSU 


Direct  HIS 

Synthetic 
estimate 

Regres- 
sion 
estimate 

Area  and  PSU 

Direct  HIS 

Synthetic 
estimate 

Estimate 

0 

Estimate 

a 

Proportion  of  people 

limited  in  activity 

Chicago 

6.33 

.73 

6.93 

6.95 

308,       ^.     . 

(combmed) 

Los  Angeles 

.100 

.006 

.133 

7.83 

.88 

6.92 

6.77 

(combmed) 

7b2 

Boston  (Suffolk  Co.  only) 

.124 

.007 

.136 

6.73 

2.46 

7.11 

7.40 

116 

.092 

.018 

.122 

Philadelphia 

6.45 
7.29 

1.03 
2.26 

7.20 
6.76 

7.59 
6.43 

111 

.129 
.146 

.011 
.024 

.144 
.129 

181 

New  York 

8.44 
13.23 

2.23 
2.52 

7.71 
7.28 

9.51 
8.39 

110    

.181 
.151 

.025 
.015 

.161 
.141 

190 

8.46 
6.10 

2.32 
1.36 

7.36 
7.40 

8.88 
8.27 

192 

.161 
.091 

.023 
.011 

.142 
.157 

193 

6.79 

1.08 

6.78 

6.30 

194 

.093 

.008 

.135 

Detroit 

11.17 

1.69 

6.89 

6.85 

309 

San  Francisco 

.131 

.010 

.132 

7.49 

1.29 

7.03 

6.95 

703 

Washington,  D.C. 

.153 

.014 

.140 

7.21 

2.72 

8.16 

10.85 

511 

.143 

.028 

.149 

4.92 

1.55 

6.41 

5.88 

541 

.092 

.015 

.116 

5.60 

1.58 

6.53 

6.08 

542 

Dallas 

.109 

.016 

.118 

8.77 

2.02 

6.63 

6.01 

503 

St.  Louis 

.081 

.010 

.121 

7.60 

1.73 

7.04 

7.05 

306 

.115 

.014 

.138 

9.59 

4.14 

6.99 

7.16 

386 

Pittsburg 

.173 

.039 

.136 

5.63 

1.14 

7.08 

6.94 

115 

.112 

.012 

.148 

Houston 

5.50 

1.07 

6.50 

6.05 

509 

Baltimore 

.133 

.014 

.116 

4.31 

.90 

7.06 

7.83 

510 

Minneapolis 

.119 

.013 

.135 

5.71 

1.28 

6.51 

5.09 

302 

Newark 

.150 

.018 

.124 

10.68 

2.46 

7.15 

7.53 

195 

.148 

.018 

.141 

Cleveland 

7.12 

1.56 

7.09 

7.22 

307 

Atlanta 

.096 

.011 

.140 

6.43 

1.59 

6.69 

6.54 

508 

Anaheim 

.131 

.017 

.121 

4.61 

1.06 

6.41 

5.40 

719            

.092 

.011 

.122 

San  Diego 

6.60 

1.62 

6.82 

6.25 

709 

.140 

.018 

.133 

Bed  disability  days  per 
person  per  year 

Chicago 

308,       ..      ., 

(combmed) 

Los  Angeles 

702,       ..     .V 

,eo  (combmed) 

762 

Boston  (Suffolk  Co.  only) 

116 

Philadelphia 

111 

181 

New  York 

110 

190 

192 

193 

194 

Detroit 

309 

San  Francisco 

703 

Washington,  D.C. 

511 

541 

542 

Dallas 

503 

St.  Louis 

306 

386 

Pittsburg 

115 

Houston 

509 

Baltimore 

510 

Minneapolis 

302 

Newark 

195 

Cleveland 

307 

Atlanta 

508 

Anaheim 

719 

San  Diego 

709 
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Table  6.  Comparison  of  Alternative  Estimates  for  20  Largest  SMSA's  for  Selected  Items— Continued 


Area  and  PSU 


Direct  HIS 


Estimate 


Synthetic 
estimate 


Regres- 
sion 
estimate 


Area  and  PSU 


Direct  HIS 


Estimate 


Synthetic 
estimate 


Proportion  of  people  with  at 
least  one  doctor  visit  in  past 
year 

Chicago 

308,       ..     .. 

(combmed) 

Los  Angeles 

^(,^  (combmed) 

762 

Boston  (Suffolk  Co.  only) 

116 

Philadelphia 

111 

181 

New  York 

110 

190 

192 

193 

194 

Detroit 

309 

San  Francisco 

703 

Washington,  D.C. 

511 

541 

542 

Dallas 

503 

St.  Louis 

306 

386 

Pittsburg 

115 

Houston 

509 

Baltimore 

510 

Minneapolis 

302 

Newark 

195 

Cleveland 

307 

Atlanta 

508 

Anaheim 

719 

San  Diego 

709 


.734 


.731 


.716 


.010 


.010 


.030 


787 

.014 

804 

.029 

811 

.024 

764 

.017 

731 

.023 

714 

.018 

755 

.013 

780 

.013 

791 

.016 

731 

.031 

774 

.028 

819 

.026 

.711 


.019 


.771 

.020 

781 

.038 

701 

.016 

743 

.017 

766 

.018 

796 

.020 

731 

.019 

751 

.019 

766 

.022 

744 

.019 

769 

.022 

.742 

.742 

.743 

.742 
.745 

.741 
.742 
.742 
.744 
.745 

.741 

.740 

.731 
.746 
.742 

.742 

.744 
.745 

.746 

.739 

.740 

.745 

.742 

.743 

.739 

.744 

.748 


.752 

.753 

.724 

.744 
.741 

.757 
.731 
.710 
.734 
.760 

.746 

.772 

.788 
.776 
.787 

.758 

.745 
.728 

.728 

.749 

.738 

.765 

.760 

.742 

.764 

.759 

.755 


Number  of  visits  to  doctors 
office  per  person  per  year 


Chicago 

308,       ..     .> 

(combmed) 

Los  Angeles 

702,       ..      .V 

(combmed) 

/b2 

Boston  (Suffolk  Co.  only) 

116 

Philadelphia 

111 

181 

New  York 

110 

190 

192 

193 

194 

Detroit 

309 

San  Francisco 

703 

Washington,  D.C. 

511 

541 

542 

Dallas 

503 

St.  Louis 

306 

386 

Pittsburg 

115 

Houston 

509 

Baltimore 

510 

Minneapolis 

302 

Newark 

195 

Cleveland 

307 

Atlanta 

508 

Anaheim 

719 

San  Diego 

709 


3.1064 

3.7735 

2.3000 

4.3027 
5.3912 

3.5824 
3.2768 
3.2148 
1.8038 
3.7432 

4.1211 

4.4083 

3.3262 
4.1166 
3.7907 

3.2234 

3.1481 
4.1039 

2.4009 

3.0075 

2.3775 

3.3466 

3.8510 

2.7351 

3.0500 

3.6602 

4.1989 


.22 

.26 

.51 

.43 
1.04 

.58 
.39 
.55 
.25 
.37 

.39 

.47 

.78 
.44 
.66 

.46 

.44 
1.09 

.30 

.36 

.31 

.46 

.55 

.37 

.47 

.52 

.64 


3.3279 

3.2628 

3.3765 

3.3829 
3.3894 

3.4168 
3.3170 
3.3041 
3.4945 
3.4537 

3.3128 

3.3573 

2.9987 
3.3712 
3.2719 

3.2825 

3.3801 
3.3840 

3.5291 

3.2181 

3.2965 

3.4124 

3.3640 

3.3958 

3.2142 

3.3939 

3.7607 
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